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Abstract 

Optimizing  operations  at  plug-in  hybrid  electric  vehicle  (PHEV)  battery  swap 
stations  is  internally  motivated  by  the  movement  to  make  transportation  cleaner 
and  more  efficient.  A  PHEV  swap  station  allows  PHEV  owners  to  quickly  exchange 
their  depleted  PHEV  battery  for  a  fully  charged  battery.  The  PHEV-Swap  Station 
Management  Problem  (PHEV-SSMP)  is  introduced,  which  models  battery  charging 
and  discharging  operations  at  a  PHEV  swap  station  facing  nonstationary,  stochas¬ 
tic  demand  for  battery  swaps,  nonstationary  prices  for  charging  depleted  batteries, 
and  nonstationary  prices  for  discharging  fully  charged  batteries.  Discharging  through 
vehicle-to-grid  is  benehcial  for  aiding  power  load  balancing.  The  objective  of  the 
PHEV-SSMP  is  to  determine  the  optimal  policy  for  charging  and  discharging  batter¬ 
ies  that  maximizes  expected  total  proht  over  a  fixed  time  horizon.  The  PHEV-SSMP 
is  formulated  as  a  hnite-horizon,  discrete-time  Markov  decision  problem  and  an  opti¬ 
mal  policy  is  found  using  dynamic  programming.  Structural  properties  are  derived, 
to  include  sufficiency  conditions  that  ensure  the  existence  of  a  monotone  optimal  pol¬ 
icy.  A  computational  experiment  is  developed  using  realistic  demand  and  electricity 
pricing  data.  The  optimal  policy  is  compared  to  two  benchmark  policies  which  are 
easily  implementable  by  PHEV  swap  station  managers.  Two  designed  experiments 
are  conducted  to  obtain  policy  insights  regarding  the  management  of  PHEV  swap 
stations.  These  insights  include  the  minimum  battery  level  in  relationship  to  PHEVs 
in  a  local  area,  the  incentive  necessary  to  discharge,  and  the  viability  of  PHEV  swap 
stations  under  many  conditions. 
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OPTIMAL  POLICIES  FOR  THE  MANAGEMENT  OF  A 
PLUG-IN  HYBRID  ELECTRIC  VEHICLE  SWAP  STATION 

I.  Introduction 

Optimizing  operations  at  plug-in  hybrid  electric  vehicle  (PHEV)  battery  swap 
stations  is  internally  motivated  by  the  movement  to  make  transportation  cleaner  and 
more  efficient.  The  U.S.  Energy  Secretary,  Ernest  Moniz  announced  a  $50  million 
budget  in  January  2014  for  research  of  vehicle  technologies  which  will  also  aid  the 
initiative  launched  in  March  2012  to  make  plug-in  electric  vehicles  more  convenient 
and  affordable  over  the  next  10  years  [1].  This  research  initiative  is  approached 
by  considering  the  optimal  management  of  PHEV  battery  swap  stations.  A  PHEV 
battery  swap  station  allows  the  PHEV  owner  to  exchange  their  depleted  battery  for  a 
fully  charged  one.  By  implementing  swap  stations,  not  only  are  PHEV  owners  offered 
the  convenience  to  swap  their  battery,  but  there  is  the  opportunity  to  control  battery 
charging  and  reduce  the  negative  effect  of  increased  demand  for  electricity  on  the 
power  grid  pm  and  reduce  the  difference  between  high-peak  and  low-peak  energy 
prices  |lj. 

The  concept  of  battery  swap  stations  for  PHEVs  was  initially  developed  by  the 
Israeli  company  Better  Place,  which  hnancially  collapsed  in  May  2013  [5].  Despite 
Better  Place’s  collapse,  it  is  still  of  great  interest  to  examine  such  swap  stations  as  the 
manufacturing  of  PHEVs  is  on  the  rise  and  the  motivation  to  switch  from  gasoline  to 
battery  power  has  not  been  diminished.  According  to  the  Department  of  Energy  [1], 
nearly  100,000  plug-in  electric  vehicles  were  purchased  by  Americans  in  2013,  which 
is  almost  twice  as  many  as  in  2012. 
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One  of  the  leading  electric  car  manufacturers,  Tesla,  first  gained  worldwide  atten¬ 
tion  when  it  released  the  first  ever  mass  produced  electric  powered  sports  car  in  2010 
[6].  The  Tesla  Model  S  (sedan)  is  the  current  model  available  for  purchase  with  two 
battery  options  and  is  marked  at  $71,070  for  the  60  kWh  battery  option,  $81,070  for 
the  85  kWh  battery  option,  and  $94,570  for  the  85  kWh  performance  model.  The 
Model  X  (crossover)  has  recently  been  unveiled  and  is  currently  available  for  reser¬ 
vation  with  delivery  expected  in  Fall  2015  [7].  A  third  model  is  said  to  be  released 
in  2017  at  a  cost  of  $35,000  by  the  Tesla  founder  and  CEO,  Elon  Musk  [8].  It  will 
be  called  the  Model  3  and  will  be  a  direct  rival  of  the  current  BMW  3  Series  electric 
car.  The  rolling  out  of  electric  vehicles  to  the  market  is  also  occurring  for  many 
other  vehicle  manufacturers.  Honda,  BMW,  Chevrolet,  Ford,  Nissan,  Cadillac,  Fiat, 
Mercedes,  Mitsubishi,  SMART,  Volkswagen,  Kia,  and  Toyota  all  carry  at  least  one 
electric  vehicle  and  can  cost  between  $23,800  for  the  Mitsubishi  i-MiEV  to  $137,000 
for  the  2014  BMW  i8  [H]. 

In  addition  to  being  one  of  the  leading  electric  car  manufacturers,  Tesla  is  also 
the  frontrunner  when  it  comes  to  charging  stations.  There  are  currently  129  Tesla 
supercharge  stations  in  North  America,  95  in  Europe  and  36  in  Asia  ng.  Electric  car 
owners  can  plug  in  their  car  at  a  supercharge  station  and  receive  120  kW  of  charge  in 
just  30  minutes  at  no  cost  to  the  consumer.  This  provides  170  miles  of  travel  for  the 
Model  S  85  kWh  battery  option.  While  this  is  a  great  option  for  PHEV  owners,  it  still 
requires  a  wait  time  while  the  battery  is  charging  and  plug-ins  may  get  congested  as 
the  number  of  PHEVs  purchased  continues  to  increase.  Battery  swap  stations  provide 
a  fast  and  convenient  way  to  drive  away  with  a  fully  charged  battery.  Tesla  presented 
the  idea  of  swap  stations  in  June  2013,  but  they  have  not  yet  come  to  market  HU. 

Widely  available  battery  swap  stations  will  help  the  movement  launched  in  March 
2012  by  the  Department  of  Energy  [U  to  make  plug-in  electric  vehicles  more  conve- 
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nient  and  affordable,  as  well  as  help  control  battery  charging  to  avoid  loss  of  power 
and  power  qnality  which  can  be  incnrred  when  batteries  are  charged  dnring  high  peak 
demand  for  electricity  [2] .  An  ancillary  benefit  of  a  swap  station  is  the  ability  to  coor¬ 
dinate  discharging  back  to  the  power  grid  throngh  vehicle-to-grid  (V2G)  technology 
[El.  When  the  charging  and  discharging  of  batteries  is  properly  coordinated  with 
the  power  grid,  load  balancing  can  occnr  naniiig. 

With  the  significant  impact  swap  stations  can  have  on  the  growing  market  for 
battery  powered  vehicles,  it  is  valnable  to  develop  a  model  that  optimizes  the  op¬ 
erations  at  a  swap  station.  As  snch,  this  thesis  presents  a  model  which  considers 
nncertainty  of  battery  swap  demand  and  nonstationary  charging  costs  to  gain  re¬ 
alistic  resnlts  that  are  robnst  to  the  stochasticity  of  the  system.  The  PHEV-Swap 
Station  Management  Problem  (PHEV-SSMP)  is  considered  and  a  Markov  decision 
process  model  [I6]  is  developed.  Markov  decision  processes  characterize  problems 
with  discrete  time  seqnential  decision  making  nnder  nncertainty  and  can  be  solved 
nsing  dynamic  programming.  They  can  be  modeled  nsing  finite  or  inhnite  horizons. 
Inhnite-horizon  models  provide  for  the  determination  of  a  stationary  optimal  policy, 
meaning  that  the  optimal  action  is  state  dependent  and  not  time  dependent.  Non¬ 
stationary  Markov  decision  processes  relax  the  assnmption  that  problem  data  does 
not  change  with  time  and  are  in  general  nnsolvable  nsing  inhnite-horizon  models  dne 
to  inhnite  data  reqnirements  nzi.  A  hnite-horizon  model  is  considered  becanse  the 
problem  data  nsed  in  the  PHEV-SSMP  is  highly  variable  with  respect  to  time.  The 
nonstationary  variable  properties  inclnde  mean  demand  for  battery  swaps,  charging 
price  for  batteries,  and  revenne  from  discharging  batteries  back  to  the  power  grid.  In 
a  seqnential  decision  making  model,  the  state  of  the  system  is  observed  at  a  certain 
point  in  time  and  an  action  is  taken.  The  action  resnlts  in  an  immediate  reward  to  the 
decision  maker  and  the  system  transitions  to  a  new  state  according  to  a  probability 
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distribution  determined  by  the  chosen  action. 

A  Markov  decision  process  contains  the  following  five  characteristics:  (1)  a  set  of 
decision  epochs  or  time  periods,  (2)  a  state  space  that  is  made  up  of  a  set  of  states 
the  system  may  be  in  at  a  given  point  in  time,  (3)  a  set  of  available  actions  given  the 
current  state  of  the  system,  (4)  a  reward  function  which  is  dependent  on  the  set  of 
states  and  actions,  and  (5)  a  transition  probability  function  which  is  also  dependent 
on  the  states  and  actions.  The  application  of  Markov  decision  processes  to  inventory 
control  models  is  widely  accepted  and  will  provide  a  framework  for  the  PHEV-SSMP 
model. 

The  Markov  decision  process  for  the  PHEV-SSMP  is  characterized  by  the  follow¬ 
ing:  (1)  decision  epochs  are  a  consistent  time  unit  at  which  a  swap  station  manager 
needs  to  determine  the  number  of  batteries  to  charge  or  discharge,  (2)  the  state  of 
the  system  is  the  total  number  of  batteries  that  are  fully  charged,  where  the  state  of 
any  given  battery  is  either  fully  charged  or  depleted,  (3)  the  action  space  is  dehned 
as  one  dimensional,  where  the  decision  maker  chooses  the  total  number  of  batteries 
to  charge  or  discharge,  (4)  the  reward  function  is  dehned  using  the  expected  reward 
criterion  which  is  comprised  of  revenue  from  battery  swaps,  revenue  from  discharging 
batteries  back  to  the  power  grid,  and  cost  from  charging  batteries,  and  (5)  transition 
probabilities  are  determined  by  customer  demand  for  battery  swaps  (where  demand 
follows  a  discrete  distribution),  the  current  state,  and  the  chosen  action. 

A  policy  consists  of  decision  rules  which  indicate  to  the  decision  maker  an  action 
to  take  in  a  given  state  at  a  given  point  in  time.  The  objective  in  solving  the  Markov 
decision  problem  (MDP)  is  to  determine  a  policy  that  maximizes  the  expected  total 
reward  criterion.  It  can  be  proven  that  when  the  demand  for  swaps  follows  a  discrete 
nonincreasing  distribution,  a  monotone  nonincreasing  policy  is  optimal.  The  opti¬ 
mal  policy,  specihcally  the  optimal  number  of  batteries  to  charge  and  discharge,  for 
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this  finite-horizon  model  is  found  using  the  backward  induction  algorithm  [TB].  The 
optimal  policy  is  compared  to  two  benchmark  policies  which  are  easy  to  implement 
at  the  swap  station.  In  the  first  benchmark  policy,  which  is  labeled  the  stationary 
benchmark  policy,  the  swap  station  maintains  a  single  target  inventory  level  of  fully 
charged  batteries  regardless  of  time  of  day  and  day  of  week.  In  the  second  benchmark 
policy,  which  is  labeled  the  dynamic  benchmark  policy,  the  swap  station  maintains 
a  distinct  target  inventory  level  for  each  time  period  (which  captures  time  of  day 
and  day  of  week  information).  Each  target  level  is  based  on  the  number  of  batteries 
at  the  swap  station  and  the  relationship  between  current  and  future  charging  costs. 
The  action  for  each  policy  is  calculated  by  taking  the  difference  between  the  current 
state  of  full  batteries  and  the  target  level.  If  the  swap  station  has  more  fully  charged 
batteries  than  the  desired  level,  they  will  discharge  down  to  the  target  and  if  the  swap 
station  has  less  fully  charged  batteries  than  the  desired  level,  they  will  charge  up  to 
the  target. 

Using  realistic  data,  the  optimal  solution  method  and  two  benchmark  policies  are 
computationally  tested  to  gain  insight  regarding  the  optimal  operations  and  policies 
which  should  take  place  at  a  PHEV  swap  station.  Two  Latin  hypercube  designed 
experiments  are  performed.  The  first  experiment  is  conducted  to  gain  overall  infor¬ 
mation  for  various  parameter  inputs  for  the  swap  station.  Specifically,  the  incentive 
which  should  be  given  by  the  power  company  is  determined,  and  other  statistically 
significant  factors  are  analyzed.  The  second  experiment  is  conducted  to  gain  insight 
into  what  the  controllable  parameters  should  be  set  to  at  a  swap  station  (e.g.,  num¬ 
ber  of  batteries,  swap  price)  in  relationship  to  the  number  of  PHEVs  in  a  local  area 
and  power  prices.  Further,  the  results  of  the  second  experiment  indicate  that  the 
dynamic  benchmark  policy  outperforms  the  stationary  benchmark  policy,  however 
both  exhibit  the  favorable  characteristic  of  ease  of  implementation. 


5 


Main  Contributions.  The  main  contributions  of  this  work  are  as  follows: 
(i)  development  of  a  Markov  decision  process  model  to  determine  the  optimal  num¬ 
ber  of  batteries  to  charge  and  discharge  at  a  PHEV  swap  station  when  factoring  in 
stochastic,  nonstationary  swap  demand,  nonstationary  charging  costs,  and  nonsta¬ 
tionary  discharging  revenues,  (ii)  proving  the  existence  of  a  nonincreasing  monotone 
optimal  policy  when  demand  is  governed  by  a  discrete  nonincreasing  distribution, 
(hi)  generation  of  two  benchmark  policies  which  are  easy  to  implement  by  a  swap 
station  manager,  and  (iv)  analysis  of  the  results  from  two  designed  experiments  using 
realistic  data  which  provide  policy  insights  for  a  swap  station. 

The  remainder  of  this  thesis  is  organized  as  follows.  In  Chapter  |TT|  relevant  lit¬ 
erature  is  reviewed  in  the  area  of  PHEV  swap  stations,  various  uses  of  dynamic 
programming  for  energy  storage  problems,  and  inventory  control  Markov  Decision 
Problems.  In  Chapter  |III|,  the  PHEV-SSMP  is  formally  defined  as  an  inventory  con¬ 
trol  MDP  to  include  decision  epochs,  state  space,  action  sets,  reward  function,  and 
transition  probability  function.  It  is  theoretically  proven  that  the  PHEV-SSMP  con¬ 
tains  a  nonincreasing  monotone  structure  in  Chapter  |HI|  which  motivates  the  optimal 
and  two  benchmark  policy  solution  methods  presented  in  Chapter  |IV[  In  Chapter  |V| 
the  proposed  model  and  solution  methods  are  computationally  validated  by  conduct¬ 
ing  two  designed  experiments  and  the  results  are  analyzed  to  arrive  at  policy  insights. 
Conclusions  and  opportunities  for  future  study  are  provided  in  Chapter  |VT| 


6 


II.  Literature  Review 


Growing  interest  in  electric  powered  vehicles  has  led  to  extensive  research  on  the 
topic  in  both  industry  and  academia.  Herein,  relevant  literature  pertaining  to  the 
PHEV  swap  station  application  and  proposed  solution  approach  is  discussed.  This 
literature  review  found  no  research  using  an  inventory  control  MDP  to  model  the 
operations  of  a  PHEV  swap  station  to  decide  the  number  of  batteries  to  charge  and 
discharge  when  factoring  in  stochastic  demand,  nonstationary  charging  costs,  and 
nonstationary  revenue  from  discharging  back  to  the  power  grid. 

The  need  to  examine  PHEVs  and  specihcally  PHEV  swap  stations  is  motivated 
by  a  variety  of  studies.  Idaho  National  Laboratory  [18]  analyzed  the  infrastructure 
requirements  for  charging  of  PHEVs  in  residential  settings  as  well  as  commercial 
settings.  The  report  explains  that  having  charging  infrastructure  available  allows  the 
vehicles  to  require  reduced  energy  storage  capability  and  thus  reduces  the  overall 
cost  of  purchasing  the  vehicles.  Transportation  system  costs  can  also  be  reduced  by 
providing  rich  charging  infrastructure  rather  than  using  larger  batteries  to  compensate 
for  lesser  infrastructure. 

Clement-Nys  et  ah  [2]  address  the  issues  caused  by  the  increase  in  demand  for  large 
amounts  of  electrical  consumption  due  to  PHEVs.  Uncontrolled  charging  of  these 
batteries  in  residential  areas  and  charging  stations  can  lead  to  power  losses,  reduction 
in  power  quality  and  reliability  problems.  They  use  two  techniques  to  model  efficient 
power  grid  operations,  quadratic  and  dynamic  programming.  The  results  from  these 
models  indicated  that  through  coordination,  which  avoids  the  charging  of  PHEV 
batteries  during  periods  of  high  peak  electricity  consumption,  power  quality  can  be 
improved  and  the  effects  of  charging  PHEV  batteries  can  be  mitigated.  Bingliang  et 
al.  |3|  study  the  impacts  of  various  charging  scenarios  on  China’s  power  system  using 
data  from  Shanghai’s  daily  load  profile  and  Monte  Carlo  simulation.  Results  from 
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their  study  indicate  that  the  level  of  charging  and  the  increase  of  charging  during  high 
peak  hours  has  a  signihcant  effect  on  the  load  prohle  in  Shanghai.  Investments  in 
battery  swapping  stations  is  recommended  to  control  the  impact  of  charging  plug-in 
electric  vehicles. 

Several  approaches  have  been  taken  to  optimize  operations  at  PHEV  swap  sta¬ 
tions.  Worley  and  Klabjan  [12]  propose  a  dynamic  programming  model  to  determine 
the  number  of  batteries  for  a  swap  station  manager  to  purchase  and  the  optimal 
number  of  batteries  to  charge  at  a  given  point  in  time.  The  objective  is  to  minimize 
the  total  cost  of  charging  batteries,  the  opportunity  cost  of  charged  batteries  that 
were  not  used  to  meet  demand,  and  some  penalty  dehned  for  unmet  demand.  The 
authors  do  allow  for  backlogging  for  unmet  past  demand.  Approximate  solutions  to 
the  model  are  obtained  by  fitting  a  piecewise  linear  function  to  the  objective  function. 
The  PHEV-SSMP  is  similar,  but  it  does  not  look  at  battery  purchasing  decisions  or 
backlogging  of  demand.  However,  discharging  batteries  using  vehicle  to  grid  (V2G) 
technology  is  considered,  where  this  problem  does  incorporate  discharging. 

A  deterministic  integer  programming  model,  considered  by  Nurre  et  ah  izni,  has 
been  used  to  determine  the  optimal  number  of  batteries  to  charge  and  discharge  at 
a  given  time.  The  model  presented  in  this  research  takes  into  account  a  cluster  of 
locations  and  seeks  to  optimize  operations  at  multiple  swap  stations  within  close 
proximity  to  one  another  such  that  proht  is  maximized.  In  addition  to  managing  the 
operations  at  the  swap  stations  to  maximize  prohtability,  the  authors  also  examine 
the  impact  these  policies  have  on  the  power  grid  and  seek  to  minimize  the  negative 
impact  of  wind  energy  in  conjunction  with  the  swap  station  operations  on  the  power 
generation  curve. 

Infrastructure  planning  of  battery  swapping  has  been  modeled  using  robust  op¬ 
timization  techniques  by  Mak  et  ah  for  making  optimal  decisions  under  limited 


and  imprecise  information.  They  consider  two  different  objectives;  the  first  focuses 
on  minimizing  the  expected  building  and  operating  costs  of  the  system  while  the 
second  seeks  to  maximize  a  robust  estimate  of  the  probability  of  meeting  a  return-on- 
investment  target.  The  decision  problem  consist  of  two  stages:  (1)  determining  where 
to  locate  swap  stations  with  limited  information  regarding  demand,  and  (2)  stocking 
sufficient  number  of  batteries  at  each  station  once  uncertain  parameters  such  as  de¬ 
mand  are  observed.  Realistic  test  data  is  set  based  on  the  San  Francisco  Bay  Area 
freeway  network.  Results  of  the  two  models  are  similar,  suggesting  that  the  two 
objectives  are  correlated.  Thus,  the  authors  suggest  using  the  retrun-on-investment 
goal  driven  model  for  computational  efficiency  to  produce  good  solutions  for  the  cost 
driven  model.  Finally,  they  examine  how  technological  advances  affect  their  model 
and  determine  that  faster  recharging  technology  is  critical  for  increasing  profitability. 

Tang  et  ah  es  construct  an  optimization  model  seeking  to  maximize  annual  proht 
of  electric  vehicle  battery  swap  stations  that  contain  photovoltaic  power  generation. 
The  system  they  describe  has  various  components  that  provide  charging  power  in¬ 
cluding  a  photovoltaic  array  which  converts  solar  energy  to  direct  current,  and  energy 
storage  batteries.  These  energy  storage  batteries  help  regulate  and  balance  the  load 
on  the  power  grid  by  storing  excess  generated  power  and  discharging  to  the  system 
when  the  system  has  insufficient  generated  power. 

The  adequacy  of  battery  swap  stations  is  assessed  by  Zhang  et  ah  [23],  who  ex¬ 
amine  the  ability  to  have  enough  fully  charged  batteries  to  satisfy  battery  swapping 
demands.  This  is  done  by  analyzing  the  probability  that  the  amount  of  fully  charged 
batteries  is  greater  than  or  equal  to  the  number  of  electric  vehicles  that  have  depleted 
batteries  in  any  1  hour  interval.  They  use  Monte  Carlo  simulation  over  a  10  day 
period  to  determine  the  expected  number  of  electric  vehicles  that  require  a  battery 
swap  per  hour.  The  results  for  demand  are  compared  to  the  current  charging  plan 
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to  determine  the  probability  that  demand  does  not  exceed  available  supply,  which 
provides  valuable  insight  for  charging  management  and  V2G  operations. 

Plug-in  hybrid  electric  vehicles  and  PHEV  swap  stations  have  been  examined  in 
various  other  contexts  as  well.  Pan  et  ah  [2l]  present  a  two-stage  stochastic  model  for 
locating  swap  stations  with  the  main  objectives  being  to  meet  customer  demand  and 
reduce  variability  from  renewable  technologies  on  the  power  grid.  The  demand  for 
PHEV  battery  swaps  is  characterized  as  a  discrete  random  variable  in  a  transportation 
network. 

Eyer  and  Corey  |1]  discuss  how  increased  use  of  PHEVs  can  help  reduce  the  sig- 
nihcant  price  difference  in  electric  energy  between  high  peak  (on-peak)  and  low  peak 
(off-peak)  prices.  At  night,  energy  prices  are  low  because  energy  use  is  low.  Energy 
prices  are  high  when  energy  use  is  high,  which  is  usually  midday  on  weekdays.  If 
PHEV  usage  continues  to  increase,  then  there  will  be  increased  demand  for  electric¬ 
ity  during  off-peak  periods  which  will  ultimately  decrease  the  price  difference  and 
help  balance  the  load  on  the  power  grid. 

The  value  of  PHEV  V2G  services  on  the  electricity  market  is  estimated  by  Sioshansi 
and  Denholm  [T2],  who  use  a  unit  commitment  model.  Vehicle  to  grid  technology 
allows  PHEVs  to  act  as  energy  storage  devices  thus  reducing  energy  system  oper¬ 
ators  reliance  on  generators.  V2G  services  include  charging  during  off-peak  hours 
of  demand  and  discharging  during  high-peak  hours  of  demand,  which  is  commonly 
referred  to  as  arbitrage.  This  has  the  potential  to  be  benehcial  not  only  to  the  energy 
system,  but  to  the  PHEV  owner  as  well.  By  allowing  their  vehicles  to  be  used  as  an 
energy  storage  device,  PHEV  owners  can  earn  revenue  which  will  reduce  the  overall 
lifetime  ownership  costs.  Sioshansi  and  Denholm  use  historical  data  from  the  Texas 
electric  power  system  to  analyze  the  beneht  of  incorporating  V2G  technology. 

Energy  storage  problems  generally  involve  balancing  power  from  the  grid  and 
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stochastic  renewable  energy  sources  such  as  wind  or  solar  power  to  smooth  energy  fluc¬ 
tuations.  Energy  storage  problems  are  being  solved  using  dynamic  programming,  ap¬ 
proximate  dynamic  programming  methods,  and  other  approximation  methods.  This 
is  of  interest  since  increased  demand  for  electricity  due  to  PHEVs  has  a  direct  impact 
on  energy  storage  and  these  problems  have  similar  characteristics  to  the  PHEV-SSMP. 

A  dynamic  programming  approach  is  considered  by  Sioshansi  et  al.  |25]  to  approx¬ 
imate  the  capacity  value  of  energy  storage  devices.  Capacity  value  is  the  metric  used 
to  quantify  a  resource’s  effect  on  system  reliability  and  is  used  for  resource  adequacy 
planning.  Using  a  deterministic  profit-maximization  dynamic  program  they  model 
storage  operations  that  contribute  to  the  capacity  of  the  system.  Using  historical 
conventional  generator,  load,  and  price  data  to  estimate  the  capacity  value  on  a  sin¬ 
gle  storage  device,  they  show  that  capacity  values  are  sensitive  to  energy  prices  with 
variability  up  to  40%. 

Salas  and  Powell  [2S]  research  the  effectiveness  of  an  approximate  dynamic  pro¬ 
gramming  (ADP)  algorithm  for  stochastic  control  of  multidimensional  energy  storage 
problems.  Their  work  primarily  focuses  on  grid-level  storage  problems  with  a  finite- 
horizon.  Stochastic  elements  of  their  model  include  wind  energy  supply,  demand  for 
electricity  and  electricity  prices.  The  ADP  resulted  in  near  optimal  control  policies 
that  were  within  1.34%  of  the  optimal  solution  for  a  variety  of  stochastic  test  problems 
and  within  0.08%  for  various  deterministic  test  problems. 

Several  approximate  policy  iteration  methods  are  examined  by  Scott  et  al.  |27|. 
They  use  the  least-squares  Bellman  error  minimization  and  also  discuss  direct  policy 
search  as  an  alternative  method  for  approximating  complex  stochastic  systems.  Their 
approximate  dynamic  programming  strategies  are  used  for  approximating  the  value 
function  of  a  class  of  energy  storage  problems  that  require  balancing  power  from  the 
grid  and  renewable  energy  sources.  Benchmark  problems  were  used  to  test  the  perfor- 
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mance  of  the  algorithms  presented  in  this  work.  Bellman  error  minimization  methods 
provided  optimal  solutions  within  60-80%  of  the  optimal  solution  while  direct  policy 
search  results  averaged  within  90%  of  the  optimal  solution.  The  authors  conclude 
that  there  are  advantages  to  using  direct  policy  search  but  recognize  limitations  for 
time-dependent  applications. 

An  inventory  control  MDP  is  used  for  the  PHEV-SSMP,  but  they  also  have  a  wide 
variety  of  other  applications.  Examples  of  how  inventory  systems  can  be  modeled  as 
MDPs  are  examined  to  gain  insight  into  various  applications.  Many  authors  explore 
whether  or  not  the  optimal  policy  of  their  system  contains  structure,  which  can  be 
valuable  due  to  ease  of  implementation  and  the  ability  to  use  algorithms  with  faster 
computation  time.  Structured  policies  could  be  monotonic  or  the  commonly  used 
{a,  policy.  The  curse  of  dimensionality  is  often  mentioned  with  MDPs  and  con¬ 
ventional  solution  methods  (e.g.,  value  iteration,  policy  iteration),  thus  it  is  common 
to  see  many  of  these  problems  being  solved  using  heuristics  and  newly  developed 
algorithms  to  approximate  optimal  solutions. 

Inventory  control  MDPs  have  been  used  to  model  a  wide  variety  of  application 
areas,  with  the  depth  of  the  literature  focusing  on  supply  chains.  Giannoccaro  and 
Pontrandolfo  [28]  model  a  supply  chain  management  problem,  which  deals  with  fac¬ 
tors  such  as  suppliers,  manufacturers,  and  distributors.  Zhang  and  Cooper  [21]  model 
simultaneous  seat-inventory  control  of  multiple  flights  as  a  customer-choice  MDP  that 
specihcally  looks  at  how  inventory  levels  effect  the  distribution  of  demand.  Yin  et  ah 
[HU]  model  an  inventory  control  policy  for  finished  products  for  a  large  paper  man¬ 
ufacturer  with  stochastic  demand.  Lewis  [3T]  examines  an  inventory  control  model 
with  risk  to  supply  chain  disruptions  by  looking  at  an  example  of  an  international 
supply  chain  with  the  risk  of  border  closures  and  congestion.  ElHafsi  [32]  examines 
an  inventory  allocation  model  for  an  assemble-to-order  system  with  multiple  demand 
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classes  as  a  MDP. 


Determining  if  a  problem  contains  structure  can  provide  valuable  insight  into  a 
problem.  Puterman  [16]  emphasizes  the  benehts  of  optimal  policies  that  contain 
structure  such  as  monotonicity.  These  structured  policies  are  signihcant  because 
of  their  appeal  to  decision  makers,  ease  of  implementation,  and  faster  computation 
time.  One  commonly  used  structured  policy  for  inventory  control  is  the  {a,  policy 
which  indicates  to  order  up  to  a  set  value  ^  once  inventory  falls  below  a  set  value 
a.  The  concept  of  the  (a,  was  hrst  presented  by  Scarf  [33],  who  denotes  it  (s,  S). 
ElHafsi  [32]  determines  the  structure  of  their  inventory  allocation  model  using  a  direct 
application  of  value  iteration  [16],  rather  than  determining  an  optimal  solution  due 
to  the  complex  nature  of  their  problem.  Lewis  [31]  also  uses  value  iteration  to  hnd 
an  optimal  order-up-to  level  for  the  international  supply  chain  model. 

In  the  case  where  structure  is  not  determined,  solution  methods  must  be  explored 
for  solving  large  scale  MDPs.  Giannoccaro  and  Pontrandolfo  [2H]  use  a  reinforcement 
learning  (RL)  algorithm  and  average  reward  criteria  to  address  some  of  the  major 
issues  in  supply  chain  management,  specihcally  focusing  on  an  inventory  ordering 
policy  to  maximize  performance  of  the  supply  chain.  When  tested  on  the  supply 
chain  management  problem,  the  proposed  approach  proved  to  be  effective  and  robust 
enough  to  deal  with  changing  demand.  Das  et  ah  [31]  also  propose  a  RL  approach 
in  conjunction  with  a  Semi-Markov  average  reward  technique  to  solve  large  scale 
MDPs.  Their  algorithm  uses  RL  to  solve  Semi-MDPs  using  average  expected  reward 
criteria.  Semi-MDPs  are  modeled  from  sequential  decision  making  problems  that  have 
probability  structures  that  are  not  solely  characterized  by  Markov  chains.  Using  RL 
has  an  advantage  over  the  traditional  methods  of  solving  MDPs  as  you  do  not  need 
to  compute  probability  matrices  and  reward  vectors,  but  instead  use  discrete  event 
simulation  to  build  a  model.  Results  from  the  Semi-Markov  average  reward  technique 


13 


algorithm  developed  by  Das  et  al.  [M]  was  tested  on  a  small  scale  inventory  control 
model  and  a  larger  scaled  one  which  resulted  in  fast  and  accurate  results.  Das  et  ah 
[21]  use  discrete  event  simulation  to  build  their  model  due  to  probability  matrices 
and  reward  functions  being  difficult  to  obtain  for  large  scale  MDPs. 

Simulation  techniques  are  widely  used,  especially  for  larger  problems.  Zhang  and 
Cooper  [221  use  simulation  techniques  to  solve  the  stochastic  optimization  problem 
where  the  demand  distribution  of  customer  seat  choices  is  dependent  on  the  state  of 
the  system.  The  highly  complex  model  makes  the  exact  solution  very  difficult  to  find, 
thus  the  authors  derive  upper  and  lower  bounds  for  the  value  function  using  simulation 
techniques  and  heuristics.  Chang  et  al.  [35|  suggest  using  simulation  in  future  research 
with  their  adaptive  sampling  algorithm,  which  approximates  the  optimal  value  of  a 
hnite-horizon  MDP. 

The  PHEV-SSMP  is  solved  using  the  backward  induction  algorithm  [T6|.  This 
algorithm  Ends  the  optimal  policy,  or  specihcally  the  optimal  number  of  batteries 
to  charge  and  discharge  at  each  decision  epoch  which  maximizes  the  expected  total 
reward.  Structural  properties  of  the  system  are  examined  and  it  is  determined  that 
a  nonincreasing  monotonic  structure  is  present. 
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III.  Problem  Statement 


The  PHEV-SSMP  is  solved  by  determining  the  optimal  number  of  batteries  to 
charge  and  discharge  over  time.  This  problem  is  modeled  as  a  Markov  decision  prob¬ 
lem  (MDP),  with  stochastic,  nonstationary  demand  for  battery  swaps,  nonstationary 
charging  costs,  and  nonstationary  revenue  from  discharging.  A  finite-horizon,  single 
product  inventory  control  model  is  considered  because  the  problem  data  is  highly 
variable  with  respect  to  time.  Nonstationary  Markov  decision  processes  relax  the  as¬ 
sumption  that  problem  data  does  not  change  with  time  and  are  in  general  unsolvable 
using  inhnite-horizon  models  due  to  infinite  data  requirements  HZl.  The  nonsta¬ 
tionary  variable  properties  in  the  PHEV-SSMP  include  demand  for  battery  swaps, 
charging  price  for  batteries,  and  revenue  from  discharging  batteries  back  to  the  power 
grid.  Motivating  the  decision  which  comprises  the  optimal  policy  is  the  maximization 
of  profitability  at  a  single  swap  station. 

Within  the  MDP  model  the  state  space  is  defined  as  the  total  number  of  batteries 
that  are  fully  charged.  The  state  of  the  batteries  is  modeled  at  a  fundamental  level 
where  each  battery  is  either  fully  charged  or  depleted.  A  solution  where  charging  and 
discharging  occur  simultaneously  can  be  equivalently  represented  as  solely  charging  or 
solely  discharging  when  the  discharging  revenue  is  less  than  or  equal  to  the  charging 
price.  Thus,  the  system  is  modeled  such  that  charging  and  discharging  never  occur 
simultaneously.  If  the  discharging  revenue  is  greater  than  the  charging  price,  the 
simplifying  assumption  is  made  that  the  PHEV  station  solely  charges  or  solely  dis¬ 
charges  at  any  point  in  time.  The  swap  station  may  discharge  up  to  the  minimum  of 
the  total  number  of  batteries  that  are  fully  charged  and  the  total  number  of  plug-ins 
available.  In  this  context,  what  is  denoted  a  plug-in  is  the  physical  entity  at  a  swap 
station  that  connects  a  battery  to  the  power  grid  thereby  allowing  it  to  draw  from 
the  power  grid  (i.e.,  charge)  or  discharge  using  V2G.  The  total  number  of  plug-ins 
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Figure  1.  Diagram  outlining  the  timing  of  events  for  the  PHEV-SSMP  MDP  model. 


or  what  is  denoted  charging  capacity  is  assumed  constant  over  time.  Similarly,  the 
swap  station  may  charge  up  to  the  total  number  of  batteries  that  are  in  the  depleted 
state  provided  that  the  charging  capacity  is  not  exceeded.  Thus,  the  total  number  of 
batteries  at  the  swap  station  is  constant  over  time. 

The  system  is  modeled  such  that  batteries  charged  at  time  t  become  full  in  time  t+ 
1.  Batteries  that  are  discharged  take  one  time  period  to  deplete  but  are  immediately 
unavailable  for  exchange.  Only  fully  charged  batteries  are  available  for  exchange 
or  discharging.  Furthermore,  batteries  that  are  fully  charged  are  always  swapped 
if  available  when  demand  arrives.  The  cost  to  charge  and  revenue  from  discharging 
batteries  is  realized  during  the  time  period  in  which  the  decision  is  made.  Backlogging 
of  demand  is  not  permitted  as  it  is  assumed  customers  will  not  wait  at  the  station 
if  batteries  are  unavailable.  The  expected  reward  criterion  captures  revenue  from 
battery  swaps,  revenue  from  discharging  batteries  back  to  the  power  grid  through 
V2G  technology,  and  cost  to  charge  batteries  at  the  swap  station.  The  event  timing 
for  the  PHEV  swap  station  is  outlined  in  Figure 
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The  MDP  for  the  PHEV-SSMP  is  mathematically  characterized  using  the  follow¬ 
ing  notation: 

1.  The  set  of  decision  epoch^  T  =  —  1},  N  <  oo,  indicates  the  discrete 

time  periods  in  which  a  decision  is  made.  As  previously  stated,  a  hnite  time 
horizon  is  considered  due  to  nonstationary  properties. 

2.  The  state  of  the  system  at  time  t,  St  G  S  =  {0, 1, ... ,  M}  indicates  the  total 
number  of  batteries  that  are  fully  charged  at  decision  epoch  t,  where  M  is 
dehned  as  the  total  number  batteries  at  the  swap  station,  thus  M  —  st  is  the 
number  of  depleted  batteries  at  time  t. 

3.  The  action  at  time  t,  at  G  =  {max(— s^,  — <h), . . . ,  0, . . . ,  min(M— St,  <h)},  Wst  G 
S  indicates  the  total  number  of  batteries  to  charge  or  discharge  at  time  t,  where 
<h  is  the  charging  capacity  of  the  system.  A  negative  action  indicates  the  dis¬ 
charging  of  batteries  and  a  positive  action  indicates  the  charging  of  batteries. 
For  clarity  in  the  model,  the  action  space  is  further  dehned.  Let 


{at  if  at  >  0, 
0  otherwise 


(1) 


\at\  if  at  <  0, 
0  otherwise 


(2) 


where  a^  is  the  number  of  batteries  charged  and  a~[  is  the  number  of  batteries 
discharged  at  time  t.  An  assumption  of  the  model  is  that  af  and  cannot 
both  be  positive  during  any  time  interval  t. 

^Decision  epoch  and  time  period  will  be  used  interchangeably  throughout  this  thesis. 
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4.  The  immediate  reward  when  action  at  is  selected  in  state  St  at  time  t  which 
leads  to  a  transition  to  state  Sj+i  is  the  profitability  of  the  system,  given  by 

at,  St+i)  =  p{st  +  at  —  St+i)  —  Kt{af)  +  Jt{cit  )  (3) 

for  t  =  1, . . . ,  iV  —  1,  where  St  +  at  —  St+i  =  minjDf,  St  —  a^},  is  the  number 
of  batteries  swapped  at  time  t.  Discrete  random  variable  Dt  represents  the 
demand  for  battery  swaps  at  time  t,  St  —  a^  is  the  number  of  batteries  available 
for  exchange,  p  is  the  revenue  per  battery  swap,  Kt  is  the  charging  cost  per 
battery  at  time  t,  and  Jt  is  the  revenue  earned  per  battery  discharged  at  time  t. 
Specification  of  Kt  and  Jt  captures  the  impacts  of  the  nonstationary  price  for 
power  over  time.  The  terminal  reward  is  calculated  as  potential  swap  revenue 
from  fully  charged  batteries,  thus  =  ps^. 

5.  The  total  number  of  batteries  fully  charged  at  decision  epoch  t  +  1  is  directly 
impacted  by  the  batteries  charged,  discharged,  and  exchanged  during  decision 
epoch  t  by  way  of  St+i  =  St  +  at  —  min{Di,  st  —  a^}.  The  probability  of  transi¬ 
tioning  to  state  j  at  time  t  -|-  1  from  state  St  when  action  at  is  taken,  denoted 
Pt{j\st,at),  is  dehned  by 

0  if  j  >  St  +  at  or  j  <  af 

Pt{j\st,at)  =  <  if  at  <  j  <  St  +  at  (4) 

Qst+at—j  if  J  = 

\ 

where  pj  =  P{Dt  =  j)  and  =  Yt!f=uPj  ~  P{Dt  >  u).  For  further  clarihcation, 
St  +  dt  —  j  indicates  the  number  of  fully  charged  batteries  that  are  swapped  in 
period  t,  and  St  -|-  at  indicates  the  number  of  fully  charged  batteries  on  hand  at 
the  end  of  the  period  if  none  are  swapped. 
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•  In  the  first  conditional,  state  j  exceeds  the  number  of  fully  charged  batter¬ 
ies  the  swap  station  could  possibly  have  on  hand  at  the  end  of  the  period 
or  state  j  is  less  than  the  number  of  batteries  the  swap  station  chooses  to 
charge,  which  are  not  available  for  exchange  until  after  demand  is  met  in 
that  period.  In  both  cases  there  is  a  zero  transition  probability. 

•  In  the  second  conditional,  state  j  is  between  the  number  of  batteries  the 
swap  station  charges  and  the  number  of  batteries  that  could  possibly  be 
on  hand  at  the  end  of  the  period.  In  this  situation,  the  swap  station  has 
enough  fully  charged  batteries  to  meet  demand,  hence  the  probability  of 
transitioning  to  state  j  is  calculated  using  the  time  dependent  discrete 
distribution  of  demand.  It  has  already  been  established  that  j  cannot  fall 
below  the  number  of  batteries  charged  in  that  period,  thus  the  lower  bound 
on  j  is  af. 

•  The  last  conditional  is  where  j  =  af,  meaning  that  demand  for  battery 
swaps  meets  or  exceeds  the  supply  of  fully  charged  batteries  at  the  begin¬ 
ning  of  the  period.  In  this  situation,  the  station  swaps  all  batteries  on  hand 
but  acquires  the  charged  batteries  at  the  end  of  the  period.  The  transition 
probability  in  this  case  is  calculated  using  the  cumulative  probability  that 
demand  meets  or  exceeds  the  number  of  batteries  available  for  swapping 
in  period  t. 

To  aid  the  reader,  the  transition  probability  function  is  illustrated  using  a  simple 
example.  Consider  the  case  where  there  are  15  fully  charged  batteries  (i.e.,  St  =  15) 
and  the  swap  station  charges  5  (i.e.,  at  =  af  =  5).  If  no  batteries  are  swapped  the 
station  will  have  a  total  of  20  batteries  at  the  end  of  the  period  (i.e.,  St+i  =  j  = 
St  +  at  =  20).  There  is  no  possible  way  to  have  more  than  st  +  at  =  20  batteries  at 
the  end  of  the  period,  thus  there  is  a  zero  transition  probability  to  a  state  greater 
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than  20.  At  the  beginning  of  the  period  there  are  St  +  a'j~  =  15  batteries  available 
for  exchange,  thus  if  all  fully  charged  batteries  are  swapped,  the  station  still  acquires 
the  5  batteries  that  were  charged  at  the  end  of  the  period.  Therefore,  the  transition 
probability  to  a  state  less  than  =  5  is  zero.  When  j  =  af  =  5,  the  15  batteries 
that  were  available  at  the  beginning  of  the  period  must  have  been  swapped  since  the 
5  charged  batteries  are  acquired  at  the  end  of  the  period.  The  transition  probability 
in  this  case  is  the  probability  that  demand  meets  or  exceeds  St  +  at—j  =  15  batteries, 
which  is  captured  in  the  third  conditional.  Consider  the  case  when  the  station  has 
7  batteries  at  the  end  of  the  period  (i.e.,  j  =  7  which  is  between  and  St  +  at). 
Since  5  batteries  were  charged,  2  are  remaining  from  the  inventory  in  the  previous 
period.  Since  the  station  started  with  15  charged  batteries,  13  of  them  must  have 
been  swapped.  Thus  the  transition  probability  to  7  batteries  is  the  probability  that 
demand  for  battery  swaps  was  equal  to  St  +  at  —  j  =  13. 

Having  specihed  the  transition  probability  function,  Pt{j\st,at),  the  expected  im¬ 
mediate  reward  function  can  be  expressed  in  terms  of  the  current  state  and  action 
only,  which  is  more  desirable  for  subsequent  calculations. 


rt{st,at)  = 


pt{st+i\st,at){p{st  +  at-  st+i))  -  Ktiaf)  +  Jt{at  ) 


(5) 


The  decision  rules  are  denoted  dt{st),  which  indicate  to  the  decision  maker  how 
to  select  an  action  at  G  at  a  given  decision  epoch  t  E  T  when  in  state  St  G  S. 
Because  the  decision  rules  depend  on  the  current  state  of  the  system  and  not  the 
entire  history  of  states,  Markovian  decision  rules  [16]  are  considered.  Furthermore, 
the  decision  rules  prescribe  a  single  specihc  action  and  not  a  probability  distribution 
on  the  action  set.  Therefore  the  decision  rules  are  deterministic.  A  policy  vr  is  a 
sequence  of  decision  rules  {di{si),d2{s2), . . .  ,dN-i{sN-i))  that  specify  the  decision 
rule  to  be  used  at  all  decision  epochs. 
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The  expected  total  reward  of  a  policy  vr,  when  the  initial  state  of  the  system  is 


si,  denoted  is  given  by 


vjf{si)  =  E 


TT 

5l 


~N-1 

'^rt{st,at)  +rAr(s7v) 
t=i 


(6) 


The  objective  is  to  determine  the  policy  tt*  with  the  maximum  expected  total  reward. 
The  optimal  value  function,  u^St),  denotes  the  maximum  over  all  policies  of  the 
expected  total  reward  from  decision  epoch  t  onward  when  the  state  at  time  t  is  s*. 
Optimality  equations,  or  Bellman  equations,  that  correspond  to  the  optimal  value 
functions  are  used  as  a  basis  for  determining  the  optimal  policies.  The  optimality 
equations  are  given  by 


max 


rtist,  at)  +  at)ut+i{j) 


j&s 


(7) 


for  t  =  1, . . . ,  iV  —  1  and  St  G  S.  For  t  =  N,  un{sn)  =  'rAr(sAr).  It  can  be  shown  that 
if  Ut{st)  is  a  solution  to  Equation  ([^  then  the  following  hold  true: 

1.  ul{st)  =  Ut{st)  for  all  St  G  S',  f  =  1, . . . ,  A^,  and 

2.  v'^{si)  =  Mi(si)  for  all  Si  G  S. 

In  other  words,  the  optimality  equations  are  indeed  optimal  and  the  solution  to  the 
optimality  equation  at  t  =  1  gives  the  expected  total  reward  for  the  entire  time 
horizon.  Since  S  is  hnite  and  is  hnite  for  each  st  G  S,  there  exists  a  deterministic 
Markovian  policy  which  is  optimal  [TB]. 


3.1  Structural  Properties 

Determining  if  the  optimal  policy  of  a  MDP  contains  structure,  such  as  monotonic¬ 
ity,  is  signihcant  due  to  the  ease  of  implementation,  appeal  to  decision  makers,  and 
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the  ability  for  faster  computation  time  [TB] .  When  an  optimal  policy  has  a  monotone 
structure,  it  can  be  solved  with  specialized  and  more  efficient  algorithms.  Thus,  it  is 
advantageous  to  prove  that  the  system  contains  a  nonincreasing  monotonic  structure. 

A  policy  7T  is  said  to  be  nonincreasing  if  for  each  f  =  1, . . . ,  iV  —  1  and  any  pair 
of  states  Si,Sj  G  S  with  Si  <  Sj,  it  is  true  that  dt{si)  >  dt{sj).  The  existence  of 
an  optimal  nonincreasing  monotone  policy  can  be  demonstrated  using  a  series  of  hve 
properties  regarding  the  reward  function  and  the  probability  of  moving  to  a  higher 
state  P!  Dehne 


gt{k\st,at)=  ^  pt{j\st,at),  f  =  l,...,iV-l  (8) 

as  the  probability  of  moving  to  state  j  >  k  at  decision  epoch  f  +  1  when  action 
at  is  chosen  in  state  s*  at  decision  epoch  t.  Let  =  A'  for  all  St  G  S',  where 
A'  =  is  the  set  of  all  possible  actions  independent  of  the  state  of  the 

system.  Note  that  a  function,  f{x,y),  is  said  to  be  subadditive  [T6|  if  for  a;  >  £  G  X 
and  y  >  y  E  Y , 

f{x,y)  +  f{x,y)  <  f{x,y)  +  f{x,y).  (9) 

First,  three  lemmas  are  outlined  that  are  utilized  in  proving  that  there  exists  a 
nonincreasing  monotone  policy  which  is  optimal. 


Lemma  1  The  function  gt{k\st,  at)  = 


st+at 

^  ^  Pst+at-j  T 

j=m£Lx{af  +  1,A:} 


.i=st+at-j 


(10) 
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Proof. 


gt{k\.st,  at)  =  ^  Ptij\st,at) 

i6{s|i>fc} 

~  ^  ^  Pst+at-j  ^ 

j>k 
af  <j<st+at 
st+at 

~  ^  ^  P St+at- j  + 

j=max{a^  +  lifc} 


Qst+at  -j 


j>k 

j=<4 


oo 

i=st+at-j 


j>k 

0=<4 


(11) 

(12) 


(13) 


□ 


Lemma  2  The  following  two  summations  are  equivalent 

St+at  st+at-k 

Y  P^t+at-j  =  Y  P^- 

j=k  2=0 

Proof. 

St+at 

'y  ^  Pst+at—j  Pst+at—k  P  Pst+at  —  ik+1)  +  •  •  •  +  Pst+at  —  (st+at)  (1^) 

j=k 

st+at-k 

~  Pst+at  —  k  P  Pst+at  —  ik+1)  P  ■  ■  ■  P  Po  —  ^  ^  Pi  (16) 

i=0 

□ 


Lemma  3  The  following  two  summations  are  equivalent 


st-\-at  oo  oo 

Y  P^t+at-jP  Y  Pi  =  YP^- 

j=af+l  i=st+at—af  *=0 


(17) 
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Proof. 


st+at  oo 


^  ^  Pst+at-jP  ^  ^  Pi 

(18) 

j=af+l  i=st+at—af 

Pst-\-o,t  —  {o,f  -\-l)  Pst-\-o.t  —  {af -\-2) 

■  T  P St+at- (st+at)  +  ^  ^ 

P. 

i=St 

(19) 

=  Ps,-ap-l  +  Pst-a--2  +  ■  ■  ■  +  Po  + 

oo 

(20) 

i=st-a~ 

St—Clf.  —1  oo  oo 

=  pi  =  J2p^ 

(21) 

□ 

Utilizing  these  lemmas,  the  existence  of  a  nonincreasing  monotone  policy  is  proven, 
which  is  outlined  in  Theorem  [U 

Theorem  1  There  exists  optimal  decision  rules  d^{st)  for  the  PHEV-SSMP  which 
are  nonincreasing  in  St  for  t  =  1, . . . ,  —  1  when  demand  Dt  is  governed  by  a  non¬ 

increasing  discrete  distribution. 

Proof.  The  claim  is  shown  by  demonstrating  that  the  PHEV-SSMP  exhibits  the 
following  5  conditions  |16j. 

1.  rt{st,  at)  is  nondecreasing  in  st  for  all  at  G  H'. 

That  rt{st,  at)  is  nondecreasing  in  St  for  a  hxed  at  means  that  for  a  hxed  action 
(i.e.,  number  of  batteries  charged  or  discharged),  the  expected  immediate  reward 
will  be  greater  when  the  number  of  full  batteries  is  greater.  This  coincides  with 
intuition  as  more  batteries  can  be  swapped  or  discharged  when  there  are  more 
full  batteries  available  thereby  leading  to  more  reward.  Consider  st  >  St,  using 
St-\-  at  —  St+i  =  min{Zi)t,  st  —  af}  for  any  value  which  Dt  can  assume.  It  can  be 
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shown  that 


rtUt,at)  >  rt{it,at)- 


(22) 


The  expected  immediate  reward  function  can  be  expressed  as 


rt{st,at)  = 

j=0 


P{Dt  =  j){pmm{j,st  -a^})  -  i^i(a+)  +  Jt{a^  ) 


(23) 


Therefore,  it  can  be  show  that 


j=0 
oo  r 


j=0 
oo  r 

j=0 


rt{st,at)  >  rt{st,at)  ^ 

P{Dt  =  j){pmm{j,  St  -  a- })  -  Kt{af)  +  Jt(a" ) 
P{Dt  =  j)  {p  niin{j,  s*  -  })  -  Kt{at )  +  Jt{at  ) 


P{Dt  =  j){p  min{ j,  st  -  at  }) 


OO  r 

sE 


(24) 


(25) 


P{Di  =  j)  (p  Si  -  a,  }) 


(26) 


Therefore,  since  P{Dt  =  j)p  is  multiplied  by  both  sides  for  all  values  of  j,  the 
above  can  be  reduced  to 


min{j,s,  -  o,  }  >  miri{j,St  -  a,  }, 


(27) 


for  all  possible  values  of  j.  Using  a  proof  by  cases,  the  three  possible  cases 
of  demand  Dt  =  j  with  respect  to  St  —  and  St  —  at  are  considered:  (a) 
j  <  St-  at,  j  <  St  -  at,  (b)  j  >  St  -  ai ,  j  <  St  -  a^ ,  and  (c)  j  >  St  -  a^, 
j  >  St  —  aj .  The  case  where  j  is  greater  than  st  —  at  and  less  than  st  —  at  does 
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not  need  to  be  considered  because  it  is  not  possible  since  St  >  St-  In  each  case, 


Equation  (27)  is  reduced  to  a  valid  statement. 


(a)  j  <  St- at  ,  j  <  St-  ai 


min{j,  st-a^}>  min{j,  st-a^}  j  =  j 


(28) 


(b)  j  >st-at  ,  j  <st-  a; 


mm{j,  st-a^}>  min{j,  St  -  a^.  }  j  >  St  - 


(29) 


(c)  j  >st-at  ,  j  >st- 

min{j,  St  -  a~[}  >  min{j,  st  -  s*  -  a~[  >  St  -  a~[  ^  St  >  h  (30) 

2.  gt{k\st,at)  is  nondecreasing  in  St  for  all  fc  G  S'  and  at  E  A'. 

That  gt{k\st,at)  is  nondecreasing  in  st  for  a  hxed  at  and  k  means  that  the 
probability  that  the  number  of  full  batteries  in  the  next  state  is  greater  than 
some  threshold  k  is  greater  when  the  number  of  full  batteries  in  the  current 
state  is  greater.  Consider  St  >  St,  it  can  be  shown  that 


gt{k\st,at)  >  gt{k\st,at) 


(31) 


Pt{j\st,at)>  (32) 


je{S\j>k} 


je{S\j>k} 


(33) 
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st+at 


^  ^  Pst+at-j  + 


j;=max{a^ +1)^} 


j>k 


oo 

P' 

_i=st+at-j  J  r 

3=0‘i 

St+at 

^  ^  Pst+at 

j=max{a^ 


> 


-3 


i=st+at-j  J 

3=aT 


(34) 


Using  a  proof  by  cases,  all  cases  of  k  with  respect  to  at  are  considered.  For  each 


case,  Equation  (34)  is  reduced  to  a  valid  statement.  Note  that  the  second  term 


of  both  the  left  hand  side  and  right  hand  side  of  Equation  (34)  is  only  included 


when  both  j  >  k  and  j  =  af,  which  represents  when  demand  meets  or  exceeds 
supply. 


(a)  af  >  k  af  +  1  >  k 

The  second  term  of  each  summation  appears  as  both  j  >  k  and  j  =  af 
are  satished.  Using  Lemma Equation  (35)  is  reduced  to  Equation  (36). 


St+at 


St+at 


j=a+ +1 


Pst+at-j  +  Y  Y  Pst+at-j  +  Y  P^^ 

i=st-\-at—a^  j=a^-\-l  i=st-\-at—a'^ 

oo  oo 

Yp^  =  Yp^ 

2  =  0  2=0 


(b)  af  <  k  af  +  1  >  k 

The  second  term  of  each  summation  does  not  appear  as  j  =  a~^  will  never 
be  satished.  Starting  from  Equation  (34),  Lemma  is  utilized  to  arrive  at 
a  known  valid  statement. 


^  ^  Pst+at-j  —  ^  ^  Pst+at-j 
j=k  j=k 

st+at-k  St+at— k 

Pi  >  5^  Pi  «■ 


(37) 

(38) 


i=0 


i=0 
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st+at—k  st+at—k  st+at—k 


E 

Pi+  Pi>  ^  Pi^ 

(39) 

i=0 

i=st-\-at  —  k-\-l  i=0 

st-}-at-k 

(40) 

i=st+at—k+l 

3.  rt{st,at)  is  a  subadditive  function  on  S'  x  A'. 

The  subadditivity  of  rt{st,  at)  implies  that  the  incremental  effect  on  the  expected 
total  reward  of  charging  less  batteries  (or  discharging  more  batteries)  is  less 
when  the  number  of  full  batteries  is  greater.  Consider  a*  >  dt  and  St  >  St, 
using  St  +  at  —  St+i  =  minjDj,  St  —  a^}  for  any  value  which  Dt  can  assume.  It 
can  be  shown  that 


j=0  ^ 
oo  r 

+5: 

j=0 

oo 

j=0  ^ 

OO 

+5: 

j  =  0  L 

oo  r 

Z 

j=0 
oo 

j=o  ^ 


rti-St:  at)  +  rt{st,  dt)  <  rt{st,  dt)  +  rt{st,  at) 
P{Dt  =  j){pmm{j,st  -  at})  -  Ktiaf)  +  Jt{at) 

P{Dt  =  j)  (pmin{j,  St  -  a“ })  -  Ktidf )  +  Jt(a“ ) 

P{Dt  =j){p min{j,  St  -  a“ })  -  Ktidf )  +  Jtid- ) 

P{Dt  =  j){pmm{j,st  -  at})  -  Ktiaf)  +  Jt{at) 

P{Dt  =  j)  [p  min{j,  st  -  ai  }) 

P{Dt  =  j)  {p  min{j,  St  -  a“  }) 


(41) 


(42) 


i=o 


+  E 

i=o  >- 


P{Dt  =  j)  [p  min{j,  st  -  dt  }) 
P{Dt  =  j)  {p  min{j,  St  -  a“ }) 


(43) 


Therefore,  since  P{Dt  =  j)p  is  multiplied  by  all  terms,  the  above  can  be  reduced 


to 


min{j,  St -a^}  +  mm{j,  St-a^}  <  mm{j,  St-a^}  +  mm{j,  St-a^},  (44) 


for  all  values  of  j.  Using  a  proof  by  cases,  every  relevant  case  of  at  and  at,  and 
each  scenario  for  demand  Dt  =  j  with  respect  to  St  —  at,  St  —  dt,  St  —  dt,  St  —  at 
are  considered.  The  case  where  dt  <  0  and  a*  >  0  is  excluded  as  this  is 
not  possible  from  the  definition  of  subadditivity  that  at  >  dt-  For  each  case, 


Equation  (44)  is  reduced  down  to  a  valid  statement. 


(a)  ht  >  0,  at  >  0  =  a^  =0 


min{j,  St-a^}  +  min{j,  St-a^}  <  min{j,  s*  -  }  +  min{j,  s*  - 

(45) 

min{j,  St}  +  min{j,  Sj}  =  min{j,  s*}  +  min{j,  St]  (46) 


(b)  at  <  0,  at  >  0  ^  dt  >  0,  at  =0 

min{j,  St}  +  min{j,  St  -  dt}  <  min{j,  St  -  dt}  +  min{j,  St}  (47) 


Every  possibility  for  demand  j  with  respect  to  St,  St  —  d^,  St  —  d^,  and  St 
is  considered.  Figure  is  provided  to  aid  the  reader  in  visualizing  the 
six  possible  scenarios.  The  ranges  i-vi  in  the  diagram  correspond  to  the 
following  scenarios  i-vi. 

i-  j  <st-  dt  ^  j  <  St,  j  <  St-  dt,  j  <  St 

min{j,  St}  min{j,  St  -  a^ }  <  min{j,  St  -  dt}  +  min{j,  St}  (48) 
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Figure  2.  Scenarios  of  demand  with  respect  to  inventory  for  case  (b). 


^  2j  =  2j 


(49) 


ii-  j  >st  ^  j  >st-at  ,  j  >  St,  j  >st-  ai 


min{j,  St}  +  min{j,  St-dt]  <  min{j,  s*  -  }  +  min{j,  s*}  (50) 

St  +  St  —  d^  =  St  —  d-f.  +  St  (51) 

iii-  j  >  St,  j  <st-  dt  =>  j  >st-  dt,  j  <  St 


mm{j,  St}  +  min{j,  St-d^}  <  min{j,  St-d^}  +  min{j,  St}  (52) 
j  +  st-  d~[  <  i  +  St^dt>Q  (53) 


iv.  i  <  St,  j  <  St-dt,  j>  St-dt  ^  j  <  St 


min{j,  Si}  +  min{j,  Si  -  }  <  min{j,  St  -  }  +  min{j,  St}  (54) 

j  +  St-  dt  <j+j^st-  dt  <  j  (55) 


V.  j  >  St,  j  >  St-dt  ,  j  <  St  ^  j  >  St-  ai 


min{j,  Si}  +  min{j,  Si  -  }  <  min{j,  St  -  }  +  min{j,  St}  (56) 
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j  +  St-  <  St- +St^j<  St 


(57) 


vi.  j  <  St,  j  >st-at  j  >  St- at  ,  j  <  St 

min{j,  St}  +  min{j,  St  -  }  <  min{j,  s*  -  }  +  min{j,  s J  (58) 

j  +  St-  a~[  <  St-  +j^st<st  (59) 

(c)  dt  <  0,  at  <  0  dt  >  0,  ttt  >  0,  >  aj 

Every  possibility  for  demand  j  with  respect  to  St  —  at,  St  —  dt,  St  —  dt,  St  — 
at  is  considered.  Figure  is  provided  to  aid  the  reader  in  visualizing  the 
six  possible  scenarios.  The  ranges  i-vi  in  the  diagram  correspond  to  the 
following  scenarios  i-vi. 


i 

iv 

iii 

V 

ii 

- 

i 

-  - 

iv 

-  St  - 

vi 

-  St  - 

V 

ii 

St  - 

-  V  St  - 

-  ah  St  - 

■  St  - 

-ai 

Figure  3.  Scenarios  of  demand  with  respect  to  inventory  for  case  (c). 


i.  j  <  St -at  =>  j  <  St- at  ,  j  <  St- dt  ,  j  <  St-  at 

(60) 
(61) 


min{j,  st-at}  +  min{j,  s*  -  } 

<  min{j,  St  -  dt}  +  min{j,  -  a^} 

j  +j  <j  +j  ^  2  j  =  2  j 


31 


ii-  j  >  St- ^  j  >  St- ,  j  >  St- ,  j  >  St-  d^ 

min{j,  St  -a~}  +  min{j,  St  -  d~ } 

<  mm{j,  St  -  flt”  }  +  min{j,  St  -a^}  (62) 

St  —  (It  +  •St  —  d-^  =  St  —  d-f.  -\-  St  —  o-t  (63) 

iii-  j  >st-  ai,  j  <st-  dj  j  >  St  -  a)",  j  <  St  -  ai 

min{j,  St  -  a^T  }  +  min{j,  st  -  a):  } 

<  min{j,  St  —  d^}  +  min{j,  St  —  a^}  (64) 

j  +  St  —  a)r  <  j  +  St  —  d^  >  a~[  (65) 

iv.  j  <st-  at",  j  <st-  fit",  j  >st-  di  j  <  St  - 

min{j,  St  -  a^T  }  +  min{j,  st  -  a):  } 

<  min{j,  St  —  d^}  +  min{j,  St  —  a^}  (66) 

j  +  st-  dt  <  j  +  j  ^  St  -  d~[  <  j  (67) 

V.  j  >st-  at",  j  >st-  at",  j  <st-  aj  j  >  St  -  d^ 

min{j,  St  -aj}  +  min{j,  St  -  dj  } 

<  min{j,  St  -  57}  +  min{j,  St  -  a^ }  <^  (68) 

j  +  St-  57  <  St  -  57  +  St  -  a7  j  <  St-  a7  (69) 

vi.  j  <st-  a7,  j  >st-  a7  ^  j  >  St  -  07,  j  <  St  -  a7 

min{j,  St  -a~}  +  min{j,  St  -  d~  } 
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<  min{j,  st-d^}  +  min{j,  s*  -  }  -^  (70) 

j  +  st-  d~  <st-  d~  +j^st<st  (71) 

4.  gt{k\st,  at)  is  a  subadditive  function  on  S'  x  ^4'  for  all  k  E  S. 

The  subadditivity  of  gt{k\st,at)  implies  that  the  incremental  effect  of  charging 
less  batteries  (or  discharging  more  batteries)  on  the  probability  that  the  system 
moves  to  a  state  of  full  batteries  above  some  threshold  k  is  less  when  the  number 
of  full  batteries  is  greater.  Consider  at  >  dt  and  St  >  St,  it  can  be  shown  that 


gt{k\st,  at)  +  gt{k\st,  at)  <  gtik\st,  at)  +  gt{k\st,  at) 


(72) 


st+at 

P St+at- j  + 

j=max{a^ 

St+at 

+  Pst+at-j  + 

j=max{d^ +1:^} 

st+dt 

—  ^  ^  Pst+dt-j  T 

j=max{a^ +1:^} 

St+at 

+  Pst+at-j  + 

j'=max{a^ 


oo 

Pi 

i=st+at-j  J 

OO 

Pi 

i=st+dt-j  J 


Pi 

oo 


I  '  •  j+k 

i=st+at-j  J  , 


i=st+at-j  J 

-1  — n~ 


(73) 


Using  a  proof  by  cases,  every  relevant  case  of  k  with  respect  to  at  and  at  is 


considered.  For  each  case,  Equation  (73)  is  reduced  to  a  valid  statement.  The 


function  gt{k\st,at)  is  comprised  of  two  terms.  The  hrst  term  calculates  the 
probability  when  demand  never  exceeds  supply  of  batteries  and  the  second 
calculates  the  cumulative  probability  that  demand  equals  or  exceeds  supply.  It 
is  indicated  in  each  case  of  the  proof  which  of  the  terms  are  included  in  the 
summation  based  on  the  relationship  between  A;,  a*,  and  a*. 
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(a)  af  >  k  af  >  k,  af  +  1  >  k,  af  +  1  >  k 

For  this  case  demand  for  battery  swaps  may  exceed  supply,  therefore  both 
terms  of  gt{k\st,at)  appear. 


st+at 


st+at 


Ps,+at-j+  Y  Y  PSt+dt-j+  Y  P^ 

j=af+l  i=st+at—af  j=af +1  i=st+dt-d^ 

st+dt  oo  St+at  oo 

<  Pst+dt-j+  Y  p^+  Y  Pst+at-j  +  Y  P^^ 


i=st+at—a'[  j=a^-\-l  i=st-\-at-a'[ 

oo  oo  oo  oo 


+  YPi  -  YPi  +  YPi  ^ 

2=0  2=0  2=0 

(75) 

OO  oo 

‘^YPi  =  ‘^YPi 

2=0  2=0 

(76) 

(b)  af  <  k,  af  >  k  =>  af  +  1  <  k,  af  +  1  >  k 

For  this  case,  because  af  >  k,  the  second  term  of  g{k\st,at)  does  appear 
when  action  at  is  taken  as  demand  can  exceed  supply.  However,  because 
af  <  k,  demand  can  never  exceed  supply  when  action  at  is  taken. 

St+at  oo  St+dt 

^  ^  Pst+at-j  +  ^  ^  Pi  “I"  ^  ^  Pst+dt-j 

j=af+l  i=st+at—a:l'  i=^ 

St+dt  St+at  oo 

—  ^  ^  P St+dt- j  +  ^  ^  Pst+at-j  +  ^  ^  Pi  ^ 

j=a'^+l  i=st+at-a:^ 

oo  St+dt— k  St+dt  — k  oo 

Yp^^  Y  p^-  Y  Pi  +  Yp^^ 

2  =  0  2=0  2=0 

st+dt-k  st+dt-k 

Y  Pi  -  Y  Pi 

2=0  2=0 

h-\-at—k  st-\-at—k  St+at— k 

Y  Pi-  Y  Pi  ^  Y  Pi^ 

2=0  2=0  i=st+at—k+l 

St+at— k 

0  <  Y  p^  (81) 

i=st+dt—k+l 
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(c)  af  <  k  af  <  k,  df  +  1  <  k,  af  +  1  <  k 

For  this  case,  demand  for  battery  swaps  never  exceeds  supply  therefore, 
the  second  term  of  gt{k\st,at)  does  not  appear  when  either  action  at  or 
action  dt  are  taken. 


st-\-(it  st-\-a,t  st-\-Q^t  stH-flt 

^  ^  Pst+at-j  +  ^  ^  Pst+at-j  —  ^  ^  Pst+dt-j  +  ^  ^  Pst+at-j  (§2) 

j=k  j=k  j=k  j=k 

st+at-k  st+dt—k  st+dt—k  st+at—k 

X]  Pi  +  X]  Pi  <  X]  Pi  +  X]  Pi  ^  (83) 

2=0  2=0  2=0  2=0 
St+at-k  st+at-k  st+dt—k 

p*+  Pi+ 

2=0  i=st+at—k+l  2=0 

st+dt-k  st+dt-k  St+at-k 

<  p^+  Pi+  Pi  ^ 

2=0  i=st+dt—k+l  2=0 


St+at-k  St+dt—k 

Pi  <  Y1  p^- 

i=st+at  —  k+l  i=st+dt  —  k+l 


(84) 

(85) 


In  Equation  (85)  the  number  of  terms  on  each  side  are  exactly  the  same, 
however  because  at  >  dt  the  start  of  the  summation  is  greater  on  the 


left  hand  side.  Therefore,  Equation  (85)  holds  when  pj  =  P{Dt  =  j)  is 
governed  by  a  nonincreasing  discrete  distribution. 


5.  tn^sn)  is  nondecreasing  in  sn- 

Consider  sn  >  sn,  it  can  be  shown  that  rAr(sAr)  >  rjv(sAr).  This  expression  is 
reduced  to  a  known  valid  statement. 


rN^SN)  >  r7v(sAr)  pSjy  >  pSN  ^  Sn  >  Sn  (86) 

□ 
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Consider  two  possibilities  for  the  state  (i.e.,  number  of  full  batteries)  at  a  swap 
station  St  >  St-  This  theorem  states  that  there  exists  an  optimal  decision  rule  where 
the  swap  station  will  never  charge  less  (or  discharge  more)  batteries  in  state  St  as 
compared  to  St-  Utilizing  this  result,  exact  solution  methods  and  two  benchmark 
solution  methods  are  outlined. 
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IV.  Methodology 


The  objective  in  solving  this  Markov  decision  problem  (MDP)  is  to  determine 
a  policy  that  maximizes  the  expected  total  reward  criterion  expressed  in  Equation 
The  set  of  states,  S,  is  finite  and  the  action  set,  is  finite  for  each  st  G  S. 
Therefore  there  exists  a  deterministic  Markov  policy  which  is  optimal.  An  optimal 
policy  for  this  hnite-horizon  model  is  found  using  the  backward  induction  algorithm 
[T6].  This  dynamic  programming  algorithm  Ends  the  optimal  policy,  or  specifically 
the  optimal  number  of  batteries  to  charge  and  discharge  at  each  decision  epoch  which 
maximizes  the  expected  total  reward.  The  backward  induction  algorithm  finds  sets 
A*^  j  which  contain  all  actions  in  As^  which  attain  the  maximum  for  the  optimality 
equations  ([^.  The  algorithm  also  evaluates  the  policy  and  computes  the  expected 
total  reward  from  each  period  to  the  end  of  the  decision  making  horizon. 

There  exists  an  optimal  policy  that  contains  a  nonincreasing  monotonic  structure 
when  demand  is  governed  by  a  discrete  nonincreasing  distribution,  thus  the  mono¬ 
tone  backward  induction  algorithm  [16]  is  also  used  to  find  an  optimal  policy,  which 
is  outlined  in  Algorithm  The  nonincreasing  monotone  backward  induction  algo¬ 
rithm  modifies  the  original  algorithm  by  redehning  the  action  set  at  each  iteration 
of  St  to  be  limited  by  the  optimal  decision  rule  of  Sj  —  1  for  each  t  E  T.  For  ex¬ 
ample,  if  the  optimal  decision  rule  at  St  =  10  is  to  charge  20  batteries,  then  the 
action  space  for  st  =  11  will  now  be  An  =  {max(— 11,  — <h), . . . ,  0, . . . ,  min(20,  $)} 
instead  of  An  =  {max(— 11,  — <h), . . . ,  0, . . . ,  min(M  —  11,  $)}.  The  modihcations  to 
the  algorithm  will  result  in  an  optimal  policy  when  demand  is  governed  by  a  discrete 
nonincreasing  distribution;  note  however,  that  there  may  be  alternative  optima  that 
are  not  monotone. 

When  there  are  [S'!  states,  |A'|  actions  in  each  state  where  A'  =  {UstesAs^},  and 
N  time  periods,  the  backward  induction  algorithm  requires  {N  —  IjlA'HS'p  multipli- 
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Algorithm  1  Nonincreasing  Monotone  Backward  Induction  [TB] 

1:  Set  t  =  N  and  un{sn)  =  for  all  sjv  G  S' 

2:  while  t  >  1  do 

3:  Set  t  =  t  —  1,  set  St  =  0  and  set  =  Aq  =  A' 

4:  while  St  <  M  do 

5:  Compute  ul{st)  by 


<(st)  =  max  <(  rt{st,at)  +  '^pt{j\st,at)ul^^{j) 

j&s 


ate  A' 


6:  Set  action  that  results  in  ul{st) 


^*st,t  =  arg  max 

ateA 


rt{st,at) 


^Pt{j\suat)u 
jes 


t+i 


(J) 


7:  if  St  <  M  then 

8:  Define  action  space  for  St  +  1  by 

A',^+1  =  {a  G  A;^  :  a  <  min{a'  G  A*^_J} 

9:  end  if 

10:  Set  St  =  St  +  I 

11:  end  while 

12:  end  while 

13:  Calculate  expected  total  reward  for  entire  horizon,  n^(si)  =  Mi(si) 


cations  to  determine  the  optimal  policy,  which  is  a  considerable  improvement  from 
complete  enumeration  of  all  possible  solutions,  which  takes  —  l)|Sp 

multiplications.  In  the  worst  case  scenario,  the  monotone  backward  induction  algo¬ 
rithm’s  computational  effort  equals  that  of  the  backward  induction,  however  when  the 
policy  is  nonincreasing  the  action  sets  decrease  in  size  with  increasing  st  and  reduce 
the  number  of  actions  that  need  to  be  evaluated  [TB]. 
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4.1  Benchmark  Policies 


Two  benchmark  policies  are  considered  such  that  the  swap  station  charges  up 
to  or  discharges  down  to  a  set  target  level  (t,  at  each  decision  epoch  t.  The  hrst 
benchmark  policy  is  a  stationary  benchmark  policy  which  picks  a  set  target  level  ( 
and  sets  Ct  =  C  all  time  periods  t.  The  second  is  a  dynamic  benchmark  policy  and 
utilizes  a  distinct  Q  for  each  time  period  t.  Utilizing  each  target  level,  the  policy  can 
be  determined  by  calculating  the  action  for  each  state  and  time  period  with  a  simple 
calculation.  Thus,  this  policy  can  be  easily  implemented  by  a  swap  station  manager. 

If  the  state  St  is  less  than  or  equal  to  the  target  level  (t,  the  swap  station  does  not 
have  as  many  fully  charged  batteries  as  desired,  thus  they  will  charge  or  do  nothing. 
The  most  that  can  be  charged  at  any  point  in  time,  denoted  C,  is  given  by 

U  =  min{M  —  St,  <h}.  (87) 

If  St  is  greater  than  (t  the  swap  station  has  more  fully  charged  batteries  than  desired, 
thus  they  will  discharge.  The  most  that  can  be  discharged  at  any  point  in  time  (i.e., 
the  most  negative  action),  denoted  D,  is  given  by 

D  =  max{— St,  — <h}.  (88) 

The  decision  rule  dt{st)  is  given  by  the  following. 

{min{Ct  -  St,  C]  if  St  <  G 

(89) 

max{Ct  -  St,  D}  if  st  >  Ct 

For  the  hrst  benchmark  policy,  a  stationary  target  level  Ct  =  C  is  derived,  where 
C  is  calculated  as  a  percentage  of  the  number  of  batteries  M  using  some  constant 
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Equation  (90)  calculates  (  using  a  traditional  rounding  function.  In  the  second 


benchmark  policy,  dynamic  target  levels  (t  are  derived  at  each  decision  epoch  as  a 
rounded  function  of  the  number  of  batteries  M  and  charging  costs  Kt  using  Equation 


(91)  for  constants  where 


c  =  L^M  +  0.5J 


(90) 


[%M  +  0.5J  if  Kt  >  Kt+i 

Ct=<  =  (91) 

[  LKM  +  0.5J  if  Kt  <  Kt+I 

These  policies  are  validated  in  Chapter  |V]  as  usable  for  real  time  decision  making 
activities  due  to  their  speed  of  calculation  and  accuracy. 
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V.  Computational  Tests 


In  this  chapter,  realistic  data  is  used  to  computationally  test  the  PHEV-SSMP 
on  a  variety  of  different  scenarios.  From  the  optimal  policies,  insights  that  would 
be  beneficial  to  a  swap  station  manager  are  deduced.  Two  Latin  hypercube  designed 
experiments  are  also  used  to  gain  insights  with  a  focus  on  the  expected  total  profit  and 
percentage  of  demand  that  is  met  when  the  optimal  policy  is  implemented.  Further, 
the  accuracy  and  speed  of  the  two  benchmark  policies  is  compared  to  the  optimal 
policy  and  optimal  solution  method. 

The  time  horizon  examined  is  a  full  week  in  one  hour  increments,  thus  the  time 
horizon  is  iV  =  (24)  (7)  +  1  =  169  and  the  number  of  decision  epochs  is  iV  —  1  =  168. 
The  first  decision  is  made  on  Monday  at  0000,  the  second  on  Monday  at  0100  until  the 
last  decision  is  made  on  Sunday  at  2300.  Historical  hourly  charging  cost  data  from 
2013  in  the  Capital  Region,  New  York  is  utilized,  which  is  obtained  from  National 
Grid  p6].  One  week  from  each  season  is  used  in  this  analysis  due  to  the  varying 
climate  and  drastic  variation  in  prices  throughout  the  year.  January  21-27  is  used  for 
Winter,  April  15-21  for  Spring,  July  15-21  for  Summer,  and  September  23-29  for  Fall. 
Note  that  the  sum  of  power  prices  over  every  hour  of  the  week  is  at  the  maximum 
for  January  21-27  and  at  a  minimum  for  September  23-29  for  2013.  The  charging 
cost  per  kWh  at  each  time  t  is  multiplied  by  60  to  calculate  the  cost  to  charge  one 
battery  Kt,  which  is  consistent  with  the  Tesla  Model  S  60  kWh  battery  option  EH 
and  can  be  completed  in  an  hour  with  level  2  or  3  charging  [18].  The  charging  cost 
per  battery  per  hour  for  the  four  weeks  of  interest  can  be  seen  in  Figure  For 
these  computational  tests,  the  discharge  revenue  Jt,  is  set  equal  to  a  percentage  of 
the  charging  cost,  Jt  =  aKt  using  a  between  0.75  and  1.25.  The  a  parameter  will 
give  insight  into  the  incentives  needed  to  be  placed  on  the  swap  station  to  encourage 
discharging  at  favorable  points  in  time. 
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Figure  4.  Charging  cost  Kt  per  battery  per  hour  in  the  Capital  Region,  NY. 

A  similar  methodology  is  considered  to  derive  the  distribution  for  swap  demand 
at  each  hour  as  Nurre  et  ah  |2U].  The  authors  assume  that  the  behaviors  for  arrivals 
at  a  swap  station  will  mimic  the  currently  observed  behaviors  at  a  gas  station.  As 
such,  they  calculate  the  percentage  of  people  who  will  frequent  a  gas  station  for  each 
hour  of  a  day  and  day  of  a  week  based  on  historical  data  at  Chevron  gas  stations 
[38].  assuming  a  customer  visits  a  gas  station  once  per  week.  This  percentage  is 
utilized  to  calculate  the  mean  arrival  rate  of  customers  Xt,  for  each  decision  epoch 
t.  Specihcally,  the  PHEV-SSMP  considers  an  area  with  7  PHEV  users  and  sets  Xt 
equal  to  the  product  of  7  and  the  percentage  of  customers  visiting  the  station  at  time 
t  from  Nurre  et  ah  [2U]. 

Two  distributions  for  modeling  swap  demand  Dt  are  considered,  geometric  and 
Poisson.  When  swap  demand  Dt  follows  a  geometric  distribution,  parameter  Vt  is  set 
to  When  swap  demand  Dt  follows  a  Poisson  distribution,  parameter  Xt  is  set 

to  Xt-  Note  that  the  geometric  distribution  is  a  nonincreasing  discrete  distribution, 
therefore  a  monotonic  nonincreasing  policy  is  optimal.  The  mean  arrival  rate  of 
customers  Xt  =  Xt  for  each  hour  of  each  day  in  a  location  with  7  =  3,  000  PHEVs 
can  be  seen  in  Figure]^  The  arrival  rate  of  customers  is  assumed  the  same  for  each 


42 


week  of  the  year. 


Figure  5.  Mean  arrival  rate  of  customers  At  in  a  location  with  3,000  PHEVs  by  time 
of  day  and  day  of  the  week. 

To  computationally  test  the  PHEV-SMMP,  two  designed  experiments  are  con¬ 
ducted.  The  hrst  designed  experiment  is  used  to  gain  general  insights  when  a  wide 
range  of  inputs  are  considered.  The  second  designed  experiment  is  conducted  with 
more  targeted  values  based  on  the  results  of  the  hrst  experiment.  With  this  second 
experiment,  values  for  the  controllable  parameters  at  a  swap  station  are  able  to  be 
determined.  With  both,  the  expected  total  reward,  percentage  of  met  demand,  and 
policies  are  utilized  to  infer  valuable  policy  insights. 

For  the  hrst  designed  experiment  the  expected  total  reward  is  used  as  the  re¬ 
sponse  variable.  This  is  found  using  the  monotone  nonincreasing  backward  induction 
algorithm  [TB]  when  demand  follows  the  geometric  distribution.  When  demand  fol¬ 
lows  a  time  dependent  Poisson  process,  two  policies  with  corresponding  expected 
total  rewards  are  found:  the  optimal  policy  is  found  using  the  backward  induction 
algorithm,  and  a  heuristic  policy  is  found  using  the  monotone  backward  induction 
algorithm.  Note  that  the  monotone  policy  is  not  always  optimal,  however  empirically 
it  has  been  verihed  to  be  optimal  in  almost  all  cases. 
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A  50-scenario  Latin  hypercube  designed  experiment  is  preformed,  which  is  a  widely 
used  design  for  deterministic  computer  simulation  models  [39].  This  space  hlling 
design  spreads  the  design  points  nearly  uniformly  to  better  characterize  the  response 
surface  in  the  region  of  experimentation.  Because  four  separate  weeks  for  charging 
cost  data  is  considered,  Kt  is  a  categorical  factor  with  four  levels  representing  the 
four  weeks  extracted  from  the  year.  The  50-scenario  design  is  conducted  for  each  of 
the  four  seasons  and  each  of  the  two  demand  distributions,  resulting  in  a  total  of  400 
scenarios.  Factors  that  are  used  in  the  design  include  the  total  number  of  batteries 
M,  the  charging  capacity  <h,  the  total  number  of  PHEVs  in  the  local  area  7,  the 
revenue  per  battery  swap  p,  and  the  percentage  of  revenue  earned  from  discharging 
with  respect  to  the  charging  cost  a.  Using  JMPllPro  software,  a  50-scenario  design 
is  generated  with  various  levels  of  each  factor  ranging  between  two  values.  The  high 
and  low  levels  used  for  this  experiment  can  be  seen  in  Table  [Tj  The  charging  costs 
for  the  four  weeks  of  interest,  and  K[ ,  are  representative  of  Winter, 

Spring,  Summer,  and  Fall,  respectively.  The  low  value  for  the  swap  revenue  p,  is  set 
less  than  the  minimum  charging  cost  over  the  four  weeks  and  the  high  value  for  p  is 
set  greater  than  the  maximum  charging  cost. 

Table  1.  Factor  levels  used  for  the  first  Latin  hypercube  designed  experiment. 


Factor 

Low 

High 

Total  Number  of  Batteries 

M 

50 

200 

Charging  Capacity 

4> 

[0.25MJ 

M 

Swap  Revenue  ($) 

P 

1 

20 

Percent  Discharge  Revenue  {%Kt) 

a 

0.75 

1.25 

PHEVs  in  the  Local  Area 

7 

1,000 

6,000 

When  considering  the  time  dependent  Poisson  process  for  demand,  the  monotone 
policy  was  optimal  in  all  but  22  scenarios.  Of  these  22  scenarios,  the  largest  percent¬ 
age  gap  in  expected  total  reward  when  compared  to  optimal  was  0.77%.  Therefore, 
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while  the  monotone  policy  is  not  always  optimal  when  demand  does  not  follow  a  non¬ 
increasing  distribution,  it  is  empirically  observed  to  provide  a  good  approximation. 
Further,  very  similar  optimal  policies  are  seen  when  using  the  Poisson  and  geometric 
distributions  to  model  demand.  Only  41  scenarios  resulted  in  a  different  expected  to¬ 
tal  reward  with  the  largest  gap  being  2.7%.  Discharging  is  often  favored  when  demand 
follows  a  Poisson  process,  however  discharging  does  occur  when  demand  is  governed 
by  a  geometric  distribution.  Due  to  the  similarities  seen,  the  results  presented  herein 
apply  to  both  distributions  unless  otherwise  stated. 

Results  from  the  designed  experiment  indicate  that  all  factors  have  a  signihcant 
effect  on  the  expected  total  reward  at  a  95%  conhdence  level,  except  for  the  charging 
costs  Kt-  This  indicates  that  even  though  there  is  a  drastic  variation  in  seasonal 
charging  prices,  it  does  not  affect  the  swap  station’s  proht.  As  expected,  the  swap 
revenue  p,  has  the  greatest  impact  on  the  expected  total  reward.  Thus,  the  most 
effective  way  to  increase  the  expected  total  reward  would  be  to  increase  the  swap 
cost,  however  this  is  based  on  the  assumption  that  demand  for  swaps  is  independent 
of  the  swap  cost  which  is  unrealistic.  Future  work  should  consider  the  sensitivity  of 
customers  to  the  price  for  swapping  as  utilizing  a  charging  station  can  occur  instead 
of  swapping.  Next,  the  signihcant  interaction  terms  with  M  are  examined:  M<h,  Mp, 
and  Ma.  The  interaction  plots  produced  by  JMPllPro  Software  for  the  second  order 
terms  can  be  seen  in  Figure 

When  M  is  at  the  low  level,  the  charging  capacity  <F  does  not  have  a  signihcant 
ehect  on  the  expected  total  reward  and  the  revenue  earned  from  discharging,  a, 
has  only  a  small  ehect.  While  increasing  the  swap  revenue  p  signihcantly  increases 
the  expected  total  reward  when  M  is  low,  it  has  a  greater  ehect  when  M  is  high. 
Furthermore,  when  M  is  high,  $  and  a  at  the  high  level  result  in  a  signihcantly 
higher  expected  total  reward  than  <F  and  a  at  the  low  level.  From  this  examination 
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Figure  6.  Interaction  plots  of  significant  factors  from  the  first  designed  experiment. 


the  following  policy  insights  are  deduced.  Having  a  correct  number  of  batteries  M 
is  an  integral  part  of  optimally  managing  the  swap  station.  When  M  is  too  low 
for  the  demand,  even  higher  charging  capacity  and  greater  percentage  earned  from 
discharging  cannot  make  up  for  the  lack  of  revenue  earned  from  not  being  able  to 
exchange  due  to  too  few  batteries.  Further,  if  it  is  desirable  for  the  swap  station 
to  serve  a  dual  purpose  by  both  satisfying  swap  demand  and  aiding  the  power  grid 
via  discharging,  having  a  sufficient  number  of  batteries  M  is  essential.  The  second 
designed  experiment  looks  at  what  M  should  be  with  respect  to  the  number  of  PHEVs 
in  the  local  area  7  to  serve  this  dual  purpose. 

Upon  analysis  of  the  remaining  interactions,  the  interaction  between  <F  and  a  was 
the  only  one  found  to  be  insightful.  When  a  is  high,  a  higher  charging  capacity 
<h  results  in  a  greater  expected  total  reward.  However,  when  a  is  low  the  charging 
capacity  does  not  have  a  signihcant  effect  on  the  expected  total  reward.  This  is 
predominantly  driven  by  the  lack  of  discharging  when  a  is  low  thereby  causing  less 
need  for  charging  capacity  <h.  Upon  further  inspection  of  the  policies  for  the  different 
levels  of  $  and  a  there  were  some  interesting  trends  in  relationship  to  p.  When  a  <  1, 
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discharging  will  only  be  desirable  when  swapping  is  not  desirable  (i.e.,  p  is  below  some 
threshold).  However,  when  p  is  above  this  same  threshold,  discharging  never  occurs 
when  a  <  1  even  when  <h  is  high.  For  this  experiment,  the  thresholds  for  p  were  $3.71, 
$2.16,  $2.16,  and  $1.78  for  Winter,  Spring,  Summer,  and  Fall,  respectively.  These  all 
fall  below  the  mean  charging  costs  which  are  $9.58,  $2.86,  $4.80,  and  $2.17.  Further, 
an  oscillation  between  charging  and  discharging  occurs  when  a  >  1  regardless  of  the 
charging  capacity  <h  or  the  swap  revenue  p,  and  little  demand  for  swaps  is  met.  These 
trends  should  be  particularly  informative  to  the  power  company.  Even  if  the  swap 
station  has  sufficient  charging  infrastructure  they  are  not  incentivized  to  discharge 
if  they  are  earning  a  discounted  rate,  as  long  as  p  is  set  appropriately.  Further,  a 
negative  behavior  occurs  possibly  furthering  the  fluctuations  seen  in  the  load  on  the 
power  grid  when  the  incentive  to  discharge  is  too  high,  regardless  of  the  charging 
infrastructure  at  the  swap  station. 

The  analysis  is  proceeded  by  further  examining  the  optimal  policies  for  different 
scenarios.  Figure  illustrates  the  optimal  policies  for  a  scenario  with  M  =  50,  $  = 
M,  p  =  15,  7  =  3,  000,  and  Kt  =  differentiated  by  three  values  for  a.  For  a 
typical  Wednesday,  F igures  [7a|  and  [7b]  show  the  optimal  policies  in  4  hour  increments 


and  Figure  shows  two  consecutive  hours.  It  can  be  visually  seen  that  the  swap 
station  never  discharges  when  a  =  0.75  as  the  policy  never  drops  below  zero  in  the 
grayed  area  of  the  Figure.  When  a  =  1,  discharging  does  occur  when  it  appears  that 
the  number  of  full  batteries  at  the  swap  station  is  above  some  threshold  (between 
25  and  35  full  batteries).  When  a  =  1.25,  the  optimal  policy  alternates  charging 
and  discharging  every  hour  when  the  swap  station  has  between  about  10  and  45  fully 
charged  batteries. 

Taking  a  closer  look  at  this  phenomenon,  the  impact  of  a  on  the  amount  of  swap 
demand  that  is  met  at  a  swap  station  is  examined.  Figure  depicts  the  ceiling  of 
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the  expected  demand  [A*]  when  demand  follows  a  Poisson  process  as  compared  to 
the  number  of  batteries  the  swap  station  is  able  to  swap  when  the  optimal  policy 
is  implemented  and  the  initial  state  is  M.  When  a  =  1.25,  the  oscillating  behavior 
between  charging  and  discharging  that  was  seen  in  F igure  [7c]  prevent s  the  satisfaction 
of  most  demand.  Further,  even  when  discharging  never  occurs  {a  =  0.75)  much 
demand  is  left  unsatished.  The  next  designed  experiment  is  performed  to  identify  the 
relationship  between  the  total  number  of  batteries  M  and  the  demand  in  a  local  area 
to  ensure  some  level  of  demand  is  met. 


(a)  CK  =  0.75  (b)  a  =  1  (c)  a  =  1.25 

Figure  7.  Optimal  policy  by  percentage  of  the  charge  cost  earned  for  discharging,  a. 


(a)  a  =  0.75  (b)  =  1  (c)  a  =  1.25 


Figure  8.  Expected  swap  demand  and  met  demand  by  percentage  of  the  charge  cost 
earned  for  discharging,  a. 


From  this  analysis  into  a  it  is  decided  that  to  maintain  the  dual  purpose  of  the 
swap  station  of  meeting  swap  demand  and  still  exhibiting  some  favorable  V2G  dis¬ 
charging  behavior,  a  =  1  is  best.  With  a  =  1  the  money  the  swap  station  earns 
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from  discharging  is  exactly  the  cost  for  charging  a  battery.  Thus,  further  analysis  will 
focus  on  the  scenarios  when  a  =  1  to  arrive  at  policy  insights. 

Next,  the  state  of  the  system  is  illustrated  when  operating  using  the  optimal  policy, 
or  the  number  of  fully  charged  batteries  the  swap  station  has  on  hand  throughout  a 
typical  week  and  day  for  a  swap  station  with  M  =  50,  $  =  M,  p  =  5,  7  =  3,  000,  a  = 
1,  and  Kt  =  Kf .  To  do  this  three  sample  paths  for  observed  demand  at  the  swap 
station  are  generated.  In  the  hrst  sample  path,  the  demand  observed  at  the  swap 
station  is  exactly  the  mean  arrival  [A*]  when  demand  follows  a  Poisson  process.  Monte 
Carlo  simulation  is  used  to  generate  two  more  sample  paths  for  observed  demand  at 
each  decision  epoch.  A  Meresenne  Twister  pseudorandom  number  generator  is  used 
to  generate  random  numbers  Rt,  between  0  and  1  for  each  decision  epoch  and  then 
the  battery  swap  demand  is  calculated  using  the  cumulative  probability  distribution 
of  demand.  The  probability  that  demand  is  less  than  or  equal  to  Xt,  P{Dt  <  Xt)  is 
set  equal  to  i?*,  where  Dt  ~  Poisson{Xt).  The  state  at  the  next  decision  epoch  t  +  1 
is  calculated  using  the  optimal  decision  rule  dl{st)  for  the  current  state  St,  and  the 
observed  demand,  denoted  Xt,  by  way  of 

st+i  =  St  +  dl{st)  -  min  {Xt,St  -  \  minld^ (s*),  0}| }.  (92) 

Assuming  the  swap  station  starts  with  all  full  batteries,  an  entire  week  is  examined 
and  then  a  specihc  day  in  more  detail.  The  state  of  the  system  at  each  decision  epoch 
and  the  corresponding  optimal  action  can  be  seen  in  Figure  for  an  entire  week. 
From  this  hgure,  note  that  the  assumption  that  the  swap  station  starts  with  all  full 
batteries  at  the  start  of  a  time  horizon  is  not  a  simplifying  assumption  as  the  number 
of  full  batteries  naturally  increases  at  the  start  of  each  day.  Also  note  that  the  state 
and  action  taken  is  relatively  consistent  for  each  of  the  three  observed  sample  paths 
of  demand.  This  is  a  nice  result  as  it  appears  the  action  taken  in  relation  to  the  state 
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balances.  Similar  results  for  Wednesday  can  be  seen  in  Figure  10 


(a)  State  versus  time. 


Figure  9.  State  and  action  over  a  week 
demands. 


(b)  Action  versus  time. 


time  period  for  three  simulated  observed 
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Hour 


(a)  State  versus  time.  (b)  Action  versus  time. 

Figure  10.  State  and  action  over  one  Wednesday  for  three  simulated  sample  paths. 


For  Wednesday,  the  number  of  swaps  occurring  in  relation  to  each  sample  path  of 


demand  is  examined  further.  Figure  11  shows  this  relationship  for  the  hrst  sample 
path  of  observed  demand  which  equals  the  expected  demand  and  the  two  Monte  Carlo 
simulations.  The  number  of  batteries  swapped  are  consistent  even  as  the  sample  path 
of  demand  is  different.  Further,  it  can  again  be  seen  that  for  this  particular  scenario 
there  is  not  a  sufficient  number  of  batteries  at  the  swap  station  to  consistently  meet 


50 


Figure  11.  Demand  throughout  a  typical  Wednesday. 


demand.  This  is  compounded  with  the  fact  that  unmet  demand  is  not  penalized  in 
this  model. 

Based  on  the  insights  drawn  from  the  hrst  experiment,  the  next  experiment  is 
performed  to  gather  insight  into  what  the  swap  station  should  use  for  its  controllable 
parameters  when  a  =  1  and  one  season  is  considered.  Specihcally,  the  aim  is  to 
determine  the  number  of  batteries  M,  charging  capacity  <h,  and  swap  cost  p  in  relation 
to  the  non-controllable  parameters.  With  this,  the  focus  transitions  from  the  expected 
total  reward  to  the  amount  of  demand  that  is  met.  A  second  Latin  hypercube  designed 
experiment  with  40  scenarios  is  performed.  The  response  variable  for  this  experiment 
is  the  percentage  of  demand  being  met  over  the  entire  week  when  the  optimal  policy 
is  implemented,  the  initial  state  of  the  system  is  M  fully  charged  batteries,  and 
the  demand  is  set  to  [A*].  For  this  experiment,  only  a  =  1  is  considered  and  for 
simplihcation  only  look  at  charging  costs  for  the  week  of  April  15-21  (Spring).  The 
seasonal  charging  cost  Kt  is  not  statistically  signihcant  with  respect  to  the  expected 
total  reward  and  the  percentage  of  demand  that  is  met.  Thus,  the  design  consists 
of  four  factors.  Using  JMPllPro  software,  a  40  scenario  design  is  generated  with 
various  levels  of  each  factor  ranging  between  two  values.  The  high  and  low  levels 
used  for  this  experiment  are  shown  in  Table  This  experiment  is  again  run  when 
demand  follows  a  geometric  distribution  and  Poisson  process  where  both  a  monotone 
policy  and  optimal  policy  are  found  when  demand  follows  a  Poisson  process.  For 
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these  40  scenarios  the  monotone  policy  is  always  optimal,  and  that  the  expected  total 
reward  found  for  all  scenarios  are  identical  regardless  of  the  demand  distribution 
used.  The  policies  differ  indicating  there  are  multiple  optimal  solutions.  The  results 
are  presented  when  demand  follows  a  Poisson  process  and  the  policy  is  monotone 
nonincreasing. 

Table  2.  Factor  levels  used  for  the  second  Latin  hypercube  designed  experiment. 


Factor 

Low 

High 

Total  Number  of  Batteries 

M 

25 

100 

Charging  Capacity 

$ 

[0.25MJ 

M 

Swap  Revenue  ($) 

P 

2 

20 

PHEVs  in  the  Local  Area 

7 

500 

3,000 

The  results  from  the  second  Latin  hypercube  designed  experiment  can  be  seen  in 
Table  Statistically  signihcant  factors  at  the  95%  conhdence  level  with  respect  to 
the  percentage  of  met  demand  include  the  number  of  batteries,  M,  and  the  number  of 
PHEVs  in  the  local  area,  7.  This  supports  the  intuition  that  the  number  of  batteries 
must  be  sufficient  based  on  the  number  of  PHEVs  in  the  local  area  to  meet  demand. 
Note  that  the  charging  capacity,  $,  and  the  swap  revenue,  p,  are  not  signihcant 
factors  with  respect  to  the  percentage  of  demand  that  is  met,  as  long  as  p  is  above 
some  threshold. 

In  scenario  34  when  p  =  2,  the  swap  station  meets  only  0.08%  of  demand.  In 
this  scenario,  the  optimal  policy  indicates  to  charge  only  when  there  are  zero  fully 
charged  batteries.  Thus,  if  p  is  set  too  low  the  swap  station  does  not  have  enough 
incentive  to  have  fully  charged  batteries  available  for  swapping,  but  rather  discharges 
to  earn  a  proht.  In  all  other  scenarios  at  least  59.07%  of  demand  is  met,  even  when 
p  =  $2.92  in  scenario  7  which  meets  98.45%  of  demand.  This  indicates  that  there  is 
some  threshold  that  p  must  be  set  to  with  respect  to  the  charging  costs  Kt,  for  the 
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swap  station  to  have  monetary  incentive  to  meet  demand  over  the  opportunity  cost 
to  discharge  batteries.  Once  this  threshold  is  met,  it  appears  that  increasing  p  does 
not  increase  the  percentage  of  demand  that  is  met. 


(a)  Scenario  3  (b)  Scenario  22  (c)  Scenario  38 

Figure  12.  Expected  demand  compared  to  the  number  of  batteries  swapped  on  a 
Wednesday  for  3  scenarios  from  the  Latin  hypercube  experiment. 


The  demand  and  battery  swaps  on  a  typical  Wednesday  for  three  scenarios  are 


illustrated  in  Figure  12  Demand  is  met  59.07%  of  the  time  in  scenario  3.  This 
indicates  that  60  batteries  is  not  enough  to  meet  demand  in  a  location  with  2,872 
PHEVs.  Scenario  22,  where  77.32%  of  demand  is  met,  indicates  that  81  batteries  isn’t 
quite  enough  to  meet  demand  when  there  are  1718  PHEVs  in  the  local  area.  Demand 
is  met  99.07%  of  the  time  in  scenario  38,  indicating  that  73  batteries  is  sufficient  to 
meet  99.07%  of  demand  in  a  location  with  885  PHEVs.  Examining  all  scenarios,  if 
M  >  6%7  then  consistently  above  95%  of  demand  is  met. 

Next,  the  benchmark  policies  are  examined  to  assess  their  accuracy  and  speed. 
For  all  scenarios  in  the  second  Latin  hypercube  experiment,  the  stationary  benchmark 
policy  (SBM)  is  tested  with  ^  =  0.5  and  the  dynamic  benchmark  policy  (DBM)  with 
'^i  =  0.25  and  =  0.75.  These  values  were  selected  with  the  aim  that  any  point 
in  time  the  swap  station  should  have  approximately  half  of  the  batteries  full  and 
available  for  swapping.  For  all  tests,  the  computation  time,  optimal  expected  total 
reward,  and  expected  percentage  of  met  demand  is  compared  to  an  optimal  policy 
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found  via  the  monotone  backward  induction  algorithm  (BI).  An  optimality  gap  is 
calculated  using  the  optimal  expected  total  reward  u^(si)  and  found  expected  total 


reward  for  policy  tt  using  Equation  (93),  where  an  optimality  gap  of  0.00% 

indicates  an  optimal  solution  has  been  found. 


Optimality  Gap  =  (93) 

The  expected  percentage  of  demand  met  is  compared  by  calculating  a  demand 
gap  equal  to  the  subtraction  of  the  value  found  in  the  benchmark  policy  from  the 
value  found  in  the  optimal  policy.  With  this  value  a  positive  number  indicates  that 
the  optimal  policy  is  meeting  more  demand,  whereas  a  negative  number  indicates  the 
benchmark  policy  is  meeting  more  demand.  The  optimality  gaps,  demand  gaps,  and 
elapsed  computation  time  needed  to  arrive  at  policies  can  be  found  in  Table 

All  three  solution  methods  require  the  use  of  probability  transition  matrices  and 
reward  vectors.  The  average  computation  time  for  creating  the  probability  matrices 
and  reward  vectors  was  249.40  and  4.67  seconds,  respectively.  Computations  were 
done  using  MATLAB  R2014a  software  on  a  2.4  GHz  Intel  Core  i5  processor  laptop 
with  4GB  1600  MHz  DDRS  of  memory. 

Disregarding  scenario  34  with  the  unrealistically  low  swap  revenue  p  =  2,  the 
stationary  benchmark  policy  is  on  average  13.08%  from  optimal  with  a  range  of 
1.11%  to  28.85%.  The  demand  gaps  indicated  that  on  average  this  benchmark  policy 
increases  the  percentage  of  met  demand  by  5.16%.  At  best,  the  stationary  benchmark 
policy  increases  met  demand  by  29.57%  and  at  worst  it  decreases  met  demand  by 
26.11%.  For  the  dynamic  benchmark  policy,  the  policy  is  on  average  6.45%  from 
optimal,  with  the  best  case  being  0.56%  and  worst  14.42%.  This  benchmark  policy 
meets  on  average  only  0.28%  less  demand  than  the  optimal  policy,  where  in  the  best 
case  met  demand  is  increased  by  20.34%  and  at  worst  met  demand  is  decreased  by 
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Table  3.  Results  from  second  Latin  hypercube  designed  experiment. 


Scenario 

M 

# 

7 

P 

Met 

Demand 

Time  (s) 

Optimality  Gap 

Demand  Gap  (%) 

BI 

SBM 

DBM 

BI 

SBM 

DBM 

BI 

SBM 

DBM 

1 

58 

20 

1846 

4.77 

74.20  % 

2.24 

0.15 

0.16 

0.00% 

10.28% 

4.96% 

0.00 

-13.47 

-4.66 

2 

48 

22 

1974 

18.15 

71.64  % 

2.03 

0.42 

0.14 

0.00% 

10.34% 

5.17% 

0.00 

-1.75 

3.88 

3 

60 

18 

2872 

13.08 

59.07  % 

2.30 

0.19 

0.17 

0.00% 

9.08% 

4.57% 

0.00 

-6.72 

0.98 

4 

27 

21 

1077 

15.38 

93.33  % 

1.41 

0.07 

0.08 

0.00% 

10.12% 

4.67% 

0.00 

17.23 

23.29 

5 

62 

45 

756 

14.46 

99.64  % 

3.20 

0.17 

0.19 

0.00% 

21.82% 

10.56% 

0.00 

-0.36 

0.24 

6 

42 

32 

692 

7.54 

99.36  % 

2.21 

0.11 

0.12 

0.00% 

16.99% 

8.09% 

0.00 

-0.64 

3.86 

7 

63 

32 

821 

2.92 

98.45  % 

2.47 

0.17 

0.17 

0.00% 

7.24% 

3.50% 

0.00 

-1.55 

-0.89 

8 

44 

28 

2936 

17.69 

59.69  % 

2.77 

0.12 

0.13 

0.00% 

7.01% 

3.51% 

0.00 

9.23 

12.47 

9 

75 

31 

628 

16.77 

99.86  % 

2.19 

0.20 

0.23 

0.00% 

25.73% 

12.52% 

0.00 

-0.14 

-0.14 

10 

92 

63 

2615 

10.77 

69.31  % 

7.47 

0.36 

0.27 

0.00% 

13.05% 

6.52% 

0.00 

-24.31 

-15.38 

11 

65 

32 

2808 

5.69 

62.15  % 

3.48 

0.19 

0.20 

0.00% 

8.80% 

4.40% 

0.00 

-9.77 

-4.66 

12 

56 

47 

2744 

8.92 

65.46  % 

4.11 

0.17 

0.16 

0.00% 

8.69% 

4.34% 

0.00 

0.92 

5.40 

13 

38 

23 

1590 

3.85 

79.83  % 

1.99 

0.10 

0.11 

0.00% 

7.80% 

3.69% 

0.00 

7.70 

13.72 

14 

94 

36 

2679 

14.92 

63.99  % 

3.93 

0.27 

0.26 

0.00% 

13.36% 

6.54% 

0.00 

-29.57 

-20.34 

15 

85 

63 

1269 

20.00 

92.96  % 

5.38 

0.24 

0.25 

0.00% 

19.98% 

9.99% 

0.00 

-7.04 

-5.56 

16 

100 

63 

1782 

6.15 

79.94  % 

5.94 

0.29 

0.36 

0.00% 

15.45% 

7.73% 

0.00 

-20.06 

-15.84 

17 

50 

24 

1141 

9.85 

90.66  % 

2.21 

0.15 

0.18 

0.00% 

14.86% 

7.13% 

0.00 

-8.69 

0.00 

18 

54 

48 

1462 

19.54 

85.77  % 

4.41 

0.16 

0.19 

0.00% 

14.03% 

6.76% 

0.00 

-9.25 

0.32 

19 

79 

75 

2038 

12.62 

77.43  % 

6.81 

0.21 

0.24 

0.00% 

13.93% 

6.78% 

0.00 

-19.93 

-11.10 

20 

96 

79 

564 

14.00 

100.00  % 

6.16 

0.26 

0.29 

0.00% 

28.85% 

14.42% 

0.00 

0.00 

0.00 

21 

40 

10 

1910 

11.69 

66.32  % 

1.30 

0.11 

0.12 

0.00% 

9.04% 

4.98% 

0.00 

0.85 

10.00 

22 

81 

29 

1718 

19.08 

77.32  % 

3.01 

0.22 

0.23 

0.00% 

16.23% 

8.12% 

0.00 

-22.63 

-16.31 

23 

98 

52 

1397 

13.54 

89.24  % 

5.05 

0.38 

0.29 

0.00% 

20.06% 

9.83% 

0.00 

-10.76 

-9.20 

24 

29 

11 

1013 

16.31 

92.50  % 

1.56 

0.13 

0.08 

0.00% 

11.23% 

5.61% 

0.00 

9.41 

16.09 

25 

88 

50 

500 

8.46 

100.00  % 

3.54 

0.31 

0.25 

0.00% 

26.24% 

13.12% 

0.00 

0.00 

0.00 

26 

33 

14 

2551 

4.31 

61.95  % 

1.65 

0.09 

0.10 

0.00% 

5.28% 

2.65% 

0.00 

16.32 

20.03 

27 

87 

58 

2359 

18.62 

70.85  % 

5.98 

0.25 

0.25 

0.00% 

13.82% 

6.75% 

0.00 

-25.01 

-16.03 

28 

83 

71 

2423 

3.38 

77.56  % 

7.55 

0.23 

0.25 

0.00% 

8.15% 

3.97% 

0.00 

-15.29 

-6.51 

29 

25 

22 

2231 

15.85 

66.80  % 

1.52 

0.07 

0.07 

0.00% 

5.24% 

2.70% 

0.00 

26.11 

28.58 

30 

77 

20 

949 

8.00 

95.15  % 

1.91 

0.20 

0.23 

0.00% 

19.49% 

9.80% 

0.00 

-4.85 

-4.66 

31 

31 

8 

1205 

5.23 

79.08  % 

1.44 

0.08 

0.09 

0.00% 

9.08% 

4.63% 

0.00 

1.80 

12.02 

32 

90 

29 

2167 

7.08 

70.04  % 

3.25 

0.25 

0.25 

0.00% 

13.63% 

6.66% 

0.00 

-28.76 

-19.88 

33 

46 

42 

1654 

10.31 

81.69  % 

3.05 

0.13 

0.15 

0.00% 

11.06% 

5.29% 

0.00 

1.10 

8.72 

34 

37 

29 

2487 

2.00 

0.08  % 

0.32 

0.10 

0.11 

0.00% 

109.77% 

113.87% 

0.00 

-51.05 

-48.17 

35 

71 

66 

3000 

17.23 

62.39  % 

5.66 

0.20 

0.20 

0.00% 

9.97% 

4.84% 

0.00 

-10.77 

-5.09 

36 

35 

20 

2295 

11.23 

65.90  % 

1.91 

0.10 

0.10 

0.00% 

6.80% 

3.21% 

0.00 

13.86 

17.26 

37 

67 

37 

2103 

12.15 

74.21  % 

3.65 

0.19 

0.20 

0.00% 

12.14% 

5.89% 

0.00 

-15.02 

-6.57 

38 

73 

73 

885 

9.38 

99.07  % 

5.06 

0.20 

0.20 

0.00% 

20.30% 

10.15% 

0.00 

-0.93 

-0.72 

39 

69 

49 

1526 

6.62 

84.73  % 

4.55 

0.20 

0.21 

0.00% 

13.88% 

6.94% 

0.00 

-15.15 

-8.19 

40 

52 

50 

1333 

2.46 

93.40  % 

5.04 

0.19 

0.15 

0.00% 

1.11% 

0.56% 

0.00 

-3.51 

5.62 

These  results  indicate  that  the  dynamic  benchmark  policy  outperforms  the  sta¬ 
tionary  benchmark  policy  due  to  decreased  optimality  gaps  and  comparable  amount 
of  met  demand.  Further,  these  results  indicate  that  the  dynamic  benchmark  policy 
could  be  a  viable  option  for  implementation  at  a  swap  station.  This  benchmark  policy 


allows  for  an  easy  calculation  of  the  number  of  batteries  to  charge  and  discharge  over 
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time  based  off  a  target  level  for  each  hour  of  a  week.  Therefore,  all  that  is  needed  is 
a  table  with  168  numbers,  one  for  each  hour  of  a  week.  In  contrast,  implementation 
of  the  optimal  policy  would  require  a  very  large  look  up  table  by  state  and  time.  The 
results  and  analysis  of  these  computational  tests  are  summarized  in  the  following 
policy  insights  for  a  PHEV  swap  station  manager  and  the  power  grid. 

1.  It  is  integral  to  have  the  number  of  batteries  at  a  swap  station  M  in  line  with 
the  PHEVs  in  the  local  area  7  for  meeting  demand,  maximizing  expected  total 
reward,  and  allowing  for  discharging  back  to  the  power  grid  using  V2G.  From 
the  results,  it  is  observed  that  M  >  6%7  was  a  sufficient  value  for  M. 

2.  To  ensure  that  the  swap  station  is  meeting  demand  and  not  solely  focused  on 
discharging  to  earn  revenue,  the  swap  revenue  p  must  be  set  appropriately. 
There  is  a  threshold  level  which  p  must  be  greater  than  to  ensure  demand  is 
met.  After  this  threshold,  increasing  p  did  not  seem  to  incentivize  meeting 
demand  over  discharging.  For  the  experiments  conducted,  this  threshold  was 
less  than  the  average  charging  cost  Kt  for  a  week. 

3.  When  the  incentive  to  discharge  is  too  high,  the  negative  behavior  of  oscillating 
between  charging  and  discharging  in  consecutive  time  periods  occurs  at  the  swap 
station  thereby  leading  to  further  variability  in  the  power  grid.  Further,  when 
the  incentive  is  too  low  and  p  is  set  appropriately,  discharging  never  occurs. 
When  the  revenue  earned  from  discharging  is  exactly  the  cost  for  charging,  a 
good  balance  of  some  discharging  but  limited  oscillating  behavior  occurs. 

4.  The  dynamic  benchmark  policy  which  calculates  a  target  level  for  each  time 
period  in  a  time  horizon  was  superior  to  a  stationary  benchmark  policy.  The 
action  for  the  dynamic  benchmark  policy  is  to  charge  up  to  or  discharge  down  to 
this  time  dependent  target  level  based  on  the  number  of  full  batteries  on  hand. 
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In  addition  to  the  advantage  over  a  stationary  benchmark  policy,  this  could  be 
a  viable  policy  to  implement  at  a  swap  station  due  to  its  accuracy  in  the  regards 
to  expected  total  reward  and  met  demand,  and  ease  of  implementation. 

5.  With  all  scenarios  considering  different  number  of  batteries  M,  charging  capac¬ 
ity  <h,  swap  revenue  p,  charging  cost  by  week  Kt,  incentive  to  discharge  a,  and 
PHEVs  in  a  local  area  7,  the  swap  station  was  always  able  to  remain  prohtable 
with  the  model.  Certain  combinations  of  these  factors  led  to  greater  prohtabil- 
ity,  but  this  result  indicates  that  in  all  circumstances  considered  a  swap  station 
is  a  viable,  prohtable  option  for  PHEVs. 
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VI.  Conclusions 


Motivated  by  the  movement  to  make  transportation  cleaner  and  more  efficient,  the 
PHEV-SSMP  is  introduced.  This  problem  considers  the  management  of  operations  at 
a  plug-in  hybrid  electric  vehicle  (PHEV)  swap  station  facing  stochastic,  nonstationary 
demand  for  battery  swaps,  nonstationary  prices  for  charging  depleted  batteries,  and 
nonstationary  prices  for  discharging  fully  charged  batteries  utilizing  V2G  technology. 
With  this,  the  optimal  number  of  batteries  that  the  swap  station  should  charge  and 
discharge  over  time  is  determined  using  sequential  decision  making  over  a  fixed  time 
horizon,  which  results  in  the  maximum  expected  total  profit. 

A  Markov  decision  process  model  is  used  when  demand  follows  a  discrete  proba¬ 
bility  distribution.  A  hnite-horizon  model  is  considered  because  the  problem  data  is 
highly  variable  with  respect  to  time.  In  the  model,  the  state  of  the  system,  or  the 
number  of  fully  charged  batteries  on  hand  is  observed  at  a  certain  point  in  time  and 
the  swap  station  manager  chooses  the  number  of  batteries  to  charge  or  discharge. 
The  action  results  in  an  immediate  reward  and  the  system  transitions  to  a  new  state. 

It  has  been  proven  that  there  exists  an  optimal  nonincreasing  monotone  policy 
when  demand  follows  a  discrete  nonincreasing  distribution.  Therefore,  both  the  back¬ 
ward  induction  and  monotone  backward  induction  algorithms  can  be  utilized  to  hnd 
the  optimal  policy.  Two  easy  to  implement  benchmark  policies  were  created  and 
empirically  compared  to  an  optimal  policy.  In  the  stationary  benchmark  policy,  the 
swap  station  maintains  a  single  target  inventory  level  of  fully  charged  batteries  re¬ 
gardless  of  time  of  day  and  day  of  week.  In  the  dynamic  benchmark  policy,  the  swap 
station  maintains  a  distinct  target  inventory  level  for  each  time  period  which  takes 
into  account  current  and  future  charging  costs. 

Two  Latin  hypercube  designed  experiments  were  performed  to  computationally 
test  the  optimal  solution  method  and  two  benchmark  policies.  The  first  experiment 
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is  conducted  to  gain  overall  information  for  various  parameter  inputs  for  the  swap 
station.  Specifically,  the  incentive  which  should  be  given  by  the  power  company  is 
determined  and  other  statistically  signihcant  factors  are  analyed.  The  second  exper¬ 
iment  is  conducted  to  gain  insight  into  what  the  controllable  parameters  should  be 
set  to  at  a  swap  station  (e.g.,  number  of  batteries,  swap  price)  in  relationship  to  the 
number  of  PHEVs  in  a  local  area  and  power  prices. 

From  this  analysis,  it  is  determined  that  the  dynamic  benchmark  policy  is  best, 
the  number  of  batteries  M  is  an  integral  parameter,  a  needs  to  be  appropriately 
set  by  the  power  company  to  encourage  discharging  and  not  oscillating  behavior, 
and  other  policy  insights.  Following  the  culmination  of  this  work  and  Widrick  et 
al.  110],  future  work  should  consider  how  the  swap  price  p  impacts  the  demand  for 
swaps  in  comparison  to  using  at  home  charging  or  a  charging  station.  Further,  the 
uncertainties  regarding  the  power  prices,  power  load,  and  other  renewables  should 
be  incorporated  into  the  state  space  of  the  MDP  to  fully  capture  the  load  balancing 
potential  of  a  PHEV  swap  station. 


59 


Bibliography 


1.  U.S.  Department  of  Energy,  “Secretary  Moniz  announces  nearly  $50  million  to 
advance  high-tech,  fuel  efficient  American  autos,”  January  2014.  Last  accessed 
on  November  20,  2014  at  http://energy.gov/articles/secretary-moniz- 
announces-nearly-50-million-advance-high-tech-fuel-ef f icient- 
american-autos, 

2.  K.  Clement-Nyns,  E.  Haesen,  and  J.  Driesen,  “The  impact  of  charging  plug-in 
hybrid  electric  vehicles  on  a  residential  distribution  grid,”  IEEE  Transactions  on 
Power  Systems,  vol.  25,  no.  1,  pp.  371-380,  2010. 

3.  Z.  Bingliang,  S.  Yutian,  L.  Bingqiang,  and  L.  Jianxiang,  “A  modeling  method 
for  the  power  demand  of  electric  vehicles  based  on  Monte  Carlo  simulation,”  in 
2012  Asia-Pacific  Power  and  Energy  Engineering  Conference,  (Shanghai,  China), 
pp.  1-5,  March  2012. 

4.  J.  Eyer  and  G.  Corey,  “Energy  storage  for  the  electricity  grid:  Benehts  and  market 
potential  assessment  guide,”  tech,  rep.,  Sandia  National  Laboratories,  February 
2010. 

5.  D.  Pearson  and  S.  T.  Stub,  Better  place’s  failure  is  blow  to  Re¬ 

nault.  The  Wall  Street  Journal,  May  2013.  Last  accessed 
on  November  20,  2014  at  http://online.wsj.com/articles/ 

SB10001424127887323855804578507263247107312, 

6.  P.  Abreu,  “The  world’s  only  electric  sports  car:  2010  tesla  roadster,”  April  2010. 
Last  accessed  on  November  20,  2014  at  http :  // www .  motorauthor ity .  com/news/ 
1044161_the-worlds-only-electric-sports-car-2010-tesla-roadster. 


60 


7.  Tesla  motors,  “Model  X,  Utility  Meet  Performance,”  August  2014.  Last  accessed 
on  November  20,  2014  at  http://www.teslamotors.coni/modelx, 

8.  S.  Fowler,  “Tesla  model  3  to  challenge  BMW  3  series  -  World  ex¬ 
clusive,”  July  2014.  Last  accessed  on  November  20,  2014  at  http: 
//www. autoexpress . co.uk/tesla/87867/tesla-model-3-to-challenge- 
bmw-3- series- world- exclusive, 

9.  Plug-In  Cars,  “Detailed  list  of  electric  cars  and  plug-in  hybrids,”  August  2014. 
Last  accessed  on  November  20,  2014  at  http://www.plugincars.com/cars. 

10.  Tesla  motors,  “Road  trips  made  easy,”  2014.  Last  accessed  on  November  20,  2014 
at  http : / /www . teslamotors . com/ supercharger. 

11.  Tesla  motors,  “Battery  swap,”  2014.  Last  accessed  on  November  20,  2014  at 
http : //www. teslamotors . com/batteryswap, 

12.  R.  Sioshansi  and  P.  Denholm,  “The  value  of  plug-in  hybrid  electric  vehicles  as 
grid  resources,”  The  Energy  Journal,  vol.  31,  no.  3,  pp.  1-23,  2010. 

13.  M.  Peng,  L.  Liu,  and  C.  Jiang,  “A  review  on  the  economic  dispatch  and  risk  man¬ 
agement  of  the  large-scale  plug-in  electric  vehicles  (PHEVs)-penetrated  power 
systems,”  Renewable  and  Sustainable  Energy  Reviews,  vol.  16,  pp.  1508-1515, 
April  2012. 

14.  J.  Wang,  C.  Liu,  D.  Ton,  Y.  Zhou,  J.  Kim,  and  A.  Vyas,  “Impact  of  plug-in 
hybrid  electric  vehicles  on  power  systems  with  demand  response  and  wind  power,” 
Energy  Policy,  vol.  39,  pp.  4016-4021,  July  2011. 

15.  L.  Goransson,  S.  Karlsson,  and  F.  Johnsson,  “Integration  of  plug-in  hybrid  elec¬ 
tric  vehicles  in  a  regional  wind-thermal  power  system,”  Energy  Policy,  vol.  38, 
pp.  5482-5492,  October  2010. 


61 


16.  M.  L.  Puterman,  Markov  decision  processes:  Discrete  stochastic  dynamic  pro¬ 
gramming.  John  Wiley  &  Sons,  2005. 

17.  A.  Ghate  and  R.  L.  Smith,  “A  linear  programming  approach  to  nonstationary 
inhnite- horizon  markov  decision  processes,”  Operations  Research,  vol.  61,  no.  2, 
pp.  413-425,  2013. 

18.  K.  Morrow,  D.  Karner,  and  J.  Francfort,  “Plug-in  hybrid  electric  vehicle  charging 
infrastructure  review,”  tech,  rep.,  U.S.  Department  of  Energy,  Idaho  National 
Laboratory,  November  2008. 

19.  O.  Worley  and  D.  Klabjan,  “Optimization  of  battery  charging  and  purchasing  at 
electric  vehicle  battery  swap  stations,”  in  2011  IEEE  Vehicle  Power  and  Propul¬ 
sion  Conference,  (Chicago,  IL),  pp.  1-4,  September  2011. 

20.  S.  G.  Nurre,  R.  Bent,  F.  Pan,  and  T.  C.  Sharkey,  “Managing  operations  of  plug¬ 
in  hybrid  electric  vehicle  (PHEV)  exchange  stations  for  use  with  a  smart  grid,” 
Energy  Policy,  vol.  67,  pp.  364-377,  2014. 

21.  H.-Y.  Mak,  Y.  Rong,  and  Z.-J.  Shen,  “Infrastructure  planning  for  electric  vehicles 
with  battery  swapping,”  Management  Science,  vol.  59,  pp.  1557-1575,  July  2013. 

22.  X.  Tang,  N.  Liu,  J.  Zhang,  and  S.  Deng,  “Capacity  optimization  conhguration  of 
electric  vehicle  battery  exchange  stations  containing  photovoltaic  power  genera¬ 
tion,”  in  2012  7th  International  Power  Electronics  and  Motion  Control  Confer¬ 
ence,  (Harbin,  China),  pp.  2061-2065,  June  2012. 

23.  C.-H.  Zhang,  J.-S.  Meng,  Y.-Z.  Cao,  X.  Cao,  Q.  Huang,  and  Q.-C.  Zhong,  “The 
adequacy  model  and  analysis  of  swapping  battery  requirement  for  electric  ve¬ 
hicles,”  in  2012  IEEE  Power  and  Energy  Society  General  Meeting,  (San  Diego, 
CA),  pp.  1-5,  July  2012. 


62 


24.  F.  Pan,  R.  Bent,  A.  Berscheid,  and  D.  Izraelevitz,  “Locating  PHEV  exchange 
stations  in  V2G,”  in  2010  First  IEEE  International  Conference  on  Smart  Grid 
Communications,  (Gaithersburg,  MD),  pp.  173-178,  October  2010. 

25.  R.  Sioshansi,  S.  H.  Madaeni,  and  P.  Denholm,  “A  dynamic  programming  ap¬ 
proach  to  estimate  the  capacity  value  of  energy  storage,”  IEEE  Transactions  on 
Power  Systems,  vol.  29,  pp.  395-403,  January  2014. 

26.  D.  F.  Salas  and  W.  B.  Powell,  “Benchmarking  a  scalable  approximate  dynamic 
programming  algorithm  for  stochastic  control  of  multidimensional  energy  storage 
problems,”  tech,  rep..  Department  of  Operations  Research  and  Financial  Engi¬ 
neering,  Princeton,  NJ,  2013. 

27.  W.  R.  Scott,  W.  B.  Powell,  and  S.  Moazehi,  “Least  squares  policy  iteration 
with  instrumental  variables  vs.  direct  policy  search:  Gomparison  against  optimal 
benchmarks  using  energy  storage,”  tech,  rep..  Department  of  Operations  Research 
and  Financial  Engineering,  Princeton  University,  January  2014.  Submitted  to 
INFORMS  Journal  on  Computing. 

28.  1.  Giannoccaro  and  P.  Pontrandolfo,  “Inventory  management  in  supply  chains: 
a  reinforcement  learning  approach,”  International  Journal  of  Production  Eco¬ 
nomics,  vol.  78,  pp.  153-161,  July  2002. 

29.  D.  Zhang  and  W.  L.  Gooper,  “Revenue  management  for  parallel  flights  with 
customer-choice  behavior,”  Operations  Research,  vol.  53,  pp.  415-431,  May- June 
2005. 

30.  K.  K.  Yin,  H.  Liu,  and  N.  E.  Johnson,  “Markovian  inventory  policy  with  ap¬ 
plication  to  the  paper  industry,”  Computers  &  Chemical  Engineering,  vol.  26, 
pp.  1399-1413,  October  2002. 


63 


31.  B.  M.  Lewis,  Inventory  control  with  risk  of  major  supply  chain  disruptions.  PhD 
thesis,  Georgia  Institute  of  Technology,  2005. 

32.  M.  ElHafsi,  “Optimal  integrated  production  and  inventory  control  of  an  assemble- 
to-order  system  with  multiple  non-unitary  demand  classes,”  European  Journal  of 
Operational  Research,  vol.  194,  pp.  127-142,  April  2009. 

33.  H.  Scarf,  “The  optimality  of  (s,S)  policies  in  the  dynamic  inventory  problem,” 
Stanford  University  Press,  1960. 

34.  T.  K.  Das,  A.  Gosavi,  S.  Mahadevan,  and  N.  Marchalleck,  “Solving  semi-markov 
decision  problems  using  average  reward  reinforcement  learning,”  Management 
Science,  vol.  45,  no.  4,  pp.  560-574,  1999. 

35.  H.  S.  Ghang,  M.  G.  Fu,  J.  Hu,  and  S.  I.  Marcus,  “An  adaptive  sampling  algo¬ 
rithm  for  solving  markov  decision  processes,”  Operations  Research,  vol.  53,  no.  1, 
pp.  126-139,  2005. 

36.  National  Grid,  “Hourly  electric  supply  charges,”  2013.  Last  accessed  on  Oc¬ 
tober  8, 2014  at  http://www.nationalgridus.com/niagaramohawk/business/ 
rates/5_hour_charge . asp, 

37.  Tesla  motors,  “Specs,”  2014.  Last  accessed  on  November  23,  2014  at  http: 
//www. teslamotors . com/models/specs. 

38.  Nexant,  Inc.,  A.  Liquide,  A.  N.  Laboratory,  G.  T.  Venture,  G.  T.  Institute, 
N.  R.  E.  Laboratory,  P.  N.  Laboratory,  and  T.  LLG,  “H2A  hydrogen  delivery  in¬ 
frastructure  analysis  models  and  conventional  pathway  options  analysis  results,” 
2008.  Last  accessed  on  November  26,  2014  at  http://wwwl.eere.energy.gov/ 
hydrogenandfuelcells/pdf s/nexant_h2a.pdf . 


64 


39.  D.  C.  Montgomery,  Design  and  analysis  of  experiments.  New  Jersey:  John  Wiley 


&  Sons,  8th  ed.,  2008. 


40.  R.  S.  Widrick,  S.  G.  Nurre,  and  M.  J.  Robbins,  “Optimal  policies  for  the  man¬ 
agement  of  a  plug-in  hybrid  electric  vehicle  swap  station,”  tech,  rep..  Air  Force 
Institute  of  Technology,  2014. 


65 


Appendix 


2 

<0 
^  HH 

H  ^ 
u  Pi 

0(/5 
<!  w 


CQ 
CQ 

o 

> 
•  csi 

S  § 

^hJ  H  O 

ZPi  H  “ 

g§ 

(j2 

LC  o 

;5s 


x> 

Oh  „ ^ 

(/5W 

WH  «S 

unj  uj 

ag 

O  cn  O 
<55 


PhM 


u 
u 

w 

CQ 

Z 


£6 

03 

Ph 


66 


